This is a tool to extract structured data from fixed layout documents such as inventory cards, inventory books or tables. It requires layout definitions provided in inventory_ocr/config/regions.yaml. Currently, it is optimized to work with a collection of jewelry from the Schmuckmuseum Pforzheim, but with minor adaptations it should be adaptable to other use cases.
Specifically, to adapt this for your own data:
- Define a different document layout in
inventory_ocr/config/regions.yaml - Implement your own post-processing by subclassing the abstract PostProcessor class in
inventory_ocr/postprocessor.py
Set up your environment using your favourite environment management tool. We recommend venv.
-
Development (editable) install: Clone this repository, navigate into the project root and type
pip install -e . -
Regular install: Activate your virtual environment and type
pip install inventory-ocr
The package will install a platform-specific OCR backend by default:
- On macOS the package uses Apple Vision via the
ocrmacpackage. - On Windows and Linux the package uses
pero-ocr.
A plain pip install will therefore pull in the appropriate OCR backend for your OS automatically (see pyproject.toml markers).
Invoke using inventory-ocr <input folder> with input folder specifying a path to the inventory cards to be processed. Results will be saved to output/ relative to the current working directory.
inventory-ocr <input_dir> [options]
Required Arguments:
input_dir: Path to the input directory containing files to process
Optional Arguments:
--output_dir: Path to the output directory (default:./output)--layout_config: Path to the layout configuration file (YAML). Defaults to 'inventory_ocr/config/regions.yaml'.--annotate: Launch the interactive layout annotation UI to create/save a regions file--ocr_engine: OCR engine to use (choices:auto,ocrmac,pero,mistral,dummy; default:auto)--eval: Run in evaluation mode (uses dummy detector and benchmarking postprocessor)
- The repository contains an example regions file under
inventory_ocr/config/example_regions.yaml. You can copy this toinventory_ocr/config/regions.yamlto use it as the default layout definition. - To create your own layout interactively, run the tool with the
--annotateflag. This launches a small Gradio UI that lets you mark regions on a template image. When you press "Extract Layout" the regions are written to the layout configuration path (see--layout_config, defaultinventory_ocr/config/regions.yaml) and the annotation UI will close. - If you prefer not to use the example, simply delete or rename
inventory_ocr/config/example_regions.yaml, or replaceinventory_ocr/config/regions.yamlwith your own configuration file.
Example:
# use the bundled example regions (if present)
inventory-ocr ./data
# interactively generate and save regions to the default config path
inventory-ocr ./data --annotate
# specify a custom path for the generated regions
inventory-ocr ./data --annotate --layout_config ./my_regions.yaml
Generated YAML uses a top-level regions: mapping where each field name maps to a 4-element list [x1, y1, x2, y2] with normalized coordinates (0..1).
The tool supports multiple OCR engines with automatic platform-based selection:
- PERO OCR (
pero): Default for non-macOS platforms - Apple Vision API (
ocrmac): Default for macOS platforms - Mistral OCR (
mistral): AI-powered OCR using Mistral models - Dummy (
dummy): For development and testing purposes - Auto (
auto): Automatically selectsocrmacon macOS,peroon other platforms
Different OCR engines require additional dependencies:
For PERO OCR (non-macOS default):
pip install inventory-ocr[pero]For Mistral OCR:
pip install inventory-ocr[mistral]You'll also need to obtain a Mistral API key and store it in a .env file in your project root:
-
Copy
.env.templateto.env:cp .env.template .env
-
Edit
.envand replaceINSERT_YOUR_KEY_HEREwith your actual Mistral API key:MISTRAL_API_KEY=your_actual_api_key_here
Note: Never share your API key publicly or commit it to version control.
For Apple Vision API (macOS only):
The ocrmac package is automatically installed on macOS systems. No additional installation required.
Note: The auto engine selection will use Apple Vision API on macOS (if available) and PERO OCR on other platforms.
Regions (in relative coordinates) and field names are stored in inventory_ocr/config/regions.yaml (or the path you pass via --layout_config) and can be adapted according to the use case.
The region definition format uses normalized coordinates where each region is defined as [x1, y1, x2, y2] with values between 0 and 1:
x1, y1: Top-left corner coordinates (relative to document width and height)x2, y2: Bottom-right corner coordinates (relative to document width and height)
custom_header_mappings: am: 'erworben am' # Maps "am" field to "erworben am" in output