Home

IIIF Print, in Plain English

What it is, in one breath

IIIF Print is a plug-in (a Rails "engine") for Hyrax and Hyku digital repositories. It takes the documents and scanned images you put into a repository and makes them readable, browsable, searchable, and zoomable in a proper page-turning viewer. It is the layer that turns "we have a pile of scanned files" into "the public can flip through, search inside, and read our collection."

It is not a standalone app. It rides on top of an existing Hyrax or Hyku site. Think of it as the engine that handles the hard parts of showing digitized material well.

The core problem it solves

Libraries and archives digitize things: newspapers, bound journals, manuscripts, reports, photo sets. After scanning, you are usually holding one of two awkward objects:

A giant multi-page PDF that a user can only download and squint at.
A folder of loose page images with no structure connecting them.

Neither of those is a good experience. Nobody wants to download a 400 MB PDF of a newspaper run to find one article, and a loose pile of TIFFs tells the user nothing about what goes with what.

IIIF Print exists to close that gap. It takes those raw scans and produces a real reading experience: a page-by-page viewer with deep zoom, full-text search across every page, and the matching word highlighted right on the image where it appears.

The specific problems, one at a time

"I have a multi-page PDF and I want people to actually read it"

Drop in a PDF and IIIF Print can split it into individual pages, convert each page into a display-friendly compressed TIFF, and present the whole thing in the Universal Viewer with page navigation and deep zoom. The user flips pages and zooms into fine detail instead of downloading one enormous file.

"My scans are pictures of text, and the computer can't read them"

A scanned page is just an image. The words on it are not text a search engine can find. IIIF Print runs OCR (via Tesseract) to generate a text layer and ALTO files, which record not only what the words are but where each word sits on the page. That word-location data is what makes the next feature possible.

"I want people to search inside the documents, not just the titles"

Because IIIF Print indexes the OCR full text, a user can search for a phrase and find it inside the body of a scanned page, not only in the catalog metadata. When a match is found, the viewer highlights the matching word right on the image. Search inside the content is the headline capability for newspaper, journal, and manuscript collections.

"A newspaper issue has many pages, but it is one thing"

Real collections are hierarchical. A newspaper issue (the parent) contains many pages or articles (the children). A bound volume contains chapters. IIIF Print models this as parent and child works and shows them together in a single viewer. A search on the parent reaches down into all of its children, so a hit on page 12 surfaces even though the user is looking at the issue as a whole.

"I don't want every single page cluttering up my search results"

If every individual page became its own catalog result, search would be a mess of thousands of fragments. IIIF Print lets you exclude specified work types from the main catalog search, so the issue shows up as a result while its hundreds of child pages stay out of the way.

"I want richer information shown inside the viewer"

IIIF Print can add metadata fields into the IIIF manifest and render them in the viewer, including faceted search links. A user reading a page can see its date, creator, or subject, and click straight through to everything else that shares that value.

"I want my images served from a dedicated image server"

For performance and scale, image tiling is often handled by a purpose-built IIIF image server rather than the main app. IIIF Print supports external IIIF image URLs and works with services such as serverless-iiif and Cantaloupe, so the heavy image work lives where it belongs.

When and where to reach for it

IIIF Print is the right tool when all of these are true:

You are on Hyrax or Hyku (it does not run on its own).
Your material is page-based and benefits from a reading experience: newspapers, journals, books, manuscripts, reports, multi-page documents.
The text inside the pages matters, so OCR and search-within are valuable.
You have parent/child structure, such as issues made of pages or volumes made of chapters.

It is overkill if you only have single standalone images with no text to search and no page structure. Hyrax's built-in image display may be enough in that case.

What it does not solve (yet, or by design)

A few honest boundaries, so nobody expects the wrong thing from it:

It is not standalone. It requires an existing Hyrax or Hyku application. The roadmap includes broader Hyrax support without a Hyku dependency, but today Hyku is the primary target.
It is not a preservation system. IIIF Print is about access and display: making material readable and searchable for users. Long-term preservation in the formal sense (fixity, format normalization for archival storage, OAIS workflows) is a separate concern handled elsewhere in your stack.
It does not build the viewer or the image server itself. It orchestrates them. The Universal Viewer does the page turning, and an image server such as Cantaloupe or serverless-iiif does the tiling. IIIF Print wires them to your content.
OCR quality is bounded by the OCR engine. Tesseract handles printed text reasonably well. It is not a handwriting (HTR) or hand-transcription tool, and messy or low-resolution scans will produce messy text and weaker search results.
IIIF Presentation Manifest support is currently v2. Support for Presentation Manifest v3 is on the roadmap, not shipped.
AllinsonFlex metadata profile support is on the roadmap, not shipped.

What it needs to run

IIIF Print leans on several external command-line tools to do its processing, so the host environment has to provide them:

Tesseract-OCR for text recognition
Ghostscript and Poppler-utils for PDF handling
ImageMagick for image conversion
LibreOffice for document conversion
FITS for file characterization

On the Rails side it targets Hyrax (roughly v2.5 through v3.5) and Hyku (roughly v4 through v5), and it supports both the ActiveFedora and Valkyrie persistence adapters. Ingest can run through the UI or in batch via Bulkrax.

The one-line summary

If you have digitized, page-based material and you want the public to flip through it, zoom in, and search the words inside it, IIIF Print is the engine that makes that happen on a Hyrax or Hyku repository. If you want long-term preservation, a standalone app, or handwriting recognition, look elsewhere for those parts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

IIIF Print, in Plain English

What it is, in one breath

The core problem it solves

The specific problems, one at a time

"I have a multi-page PDF and I want people to actually read it"

"My scans are pictures of text, and the computer can't read them"

"I want people to search inside the documents, not just the titles"

"A newspaper issue has many pages, but it is one thing"

"I don't want every single page cluttering up my search results"

"I want richer information shown inside the viewer"

"I want my images served from a dedicated image server"

When and where to reach for it

What it does not solve (yet, or by design)

What it needs to run

The one-line summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IIIF Print Wiki

Clone this wiki locally