-
Notifications
You must be signed in to change notification settings - Fork 0
feat: batch OCR processing for all manuscript pages #120
Copy link
Copy link
Open
Labels
area:studioStudio workspace and tabsStudio workspace and tabsminorIncrements the minor version when adding new functionality in a backward-compatible manner.Increments the minor version when adding new functionality in a backward-compatible manner.priority:P1High priorityHigh prioritystatus:readyReady to be implementedReady to be implementedtype:featureNew user-facing featureNew user-facing feature
Metadata
Metadata
Assignees
Labels
area:studioStudio workspace and tabsStudio workspace and tabsminorIncrements the minor version when adding new functionality in a backward-compatible manner.Increments the minor version when adding new functionality in a backward-compatible manner.priority:P1High priorityHigh prioritystatus:readyReady to be implementedReady to be implementedtype:featureNew user-facing featureNew user-facing feature
Summary
Add a "Run OCR on all pages" action that processes every page of a manuscript in sequence (or parallel batches), with a real-time progress bar.
Motivation
Today OCR runs one page at a time via HTMX polling. For a 200-page manuscript this means 200 manual clicks. Batch processing is the single biggest UX improvement for transcription workflows.
Proposed approach
POST /studio/{manifest_id}/ocr/batchendpoint that queues all un-OCR'd pages as a single job.JobManager/ export job infrastructure for progress tracking.Acceptance criteria
Technical notes
max_concurrent_ocrsetting in config.src/universal_iiif_core/services/export/service.pyjob pattern is a good template.