Handwritten document transcription remains labor-intensive, but new AI systems may reduce that burden. This project aims to assess how accurately and usefully current AI tools can transcribe handwritten documents across different genres, levels of legibility, and contextual density. More importantly, we want to identify a stable default workflow for manual correction and build a reusable evaluation framework for a larger-scale study.
This repository includes several transcription experiments we performed on historical documents and recent handwritten notes. The results are attached as PDF files in the ZIP folder. Using the one-shot prompt method on Gemini 3.0, we received fairly accurate results but pecularities, such as indentations, columns, and symbols, require more refined prompting. Thus, although the LLM could transcribe even the most illegible handwriting, however, the exact output format needs further prompting and specifications. So, we have designed a relatively simple script to convert archival images and metadata into reviewable output.
The process begins by exporting item metadata and image URLs from the Omeka API, splitting the exported data by collection, and then dividing each collection into smaller JSONL batches for Gemini transcription. Gemini returns transcription results as JSONL, and finally those results can then be converted into readable HTML files.
python accuracy.py --ai ai.txt --truth truth.txt