AI Approaches to Handwriting Transcription

Project Overview

Handwritten document transcription remains labor-intensive, but new AI systems may reduce that burden. This project aims to assess how accurately and usefully current AI tools can transcribe handwritten documents across different genres, levels of legibility, and contextual density. More importantly, we want to identify a stable default workflow for manual correction and build a reusable evaluation framework for a larger-scale study.

Experiment Results

This repository includes several transcription experiments we performed on historical documents and recent handwritten notes. The results are attached as PDF files in the ZIP folder. Using the one-shot prompt method on Gemini 3.0, we received fairly accurate results but pecularities, such as indentations, columns, and symbols, require more refined prompting. Thus, although the LLM could transcribe even the most illegible handwriting, however, the exact output format needs further prompting and specifications. So, we have designed a relatively simple script to convert archival images and metadata into reviewable output.

Scale-up Workflow

The process begins by exporting item metadata and image URLs from the Omeka API, splitting the exported data by collection, and then dividing each collection into smaller JSONL batches for Gemini transcription. Gemini returns transcription results as JSONL, and finally those results can then be converted into readable HTML files.

Run script

python accuracy.py --ai ai.txt --truth truth.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
AI_transcription_experiment_results.zip		AI_transcription_experiment_results.zip
README.md		README.md
accuracy.py		accuracy.py
ai.txt		ai.txt
export.py		export.py
gemini_execution_prompt_en.txt		gemini_execution_prompt_en.txt
gemini_prompt_md		gemini_prompt_md
jsonl_to_html.py		jsonl_to_html.py
split_batch.py		split_batch.py
split_collection.py		split_collection.py
truth.txt		truth.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Approaches to Handwriting Transcription

Project Overview

Experiment Results

Scale-up Workflow

Run script

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Approaches to Handwriting Transcription

Project Overview

Experiment Results

Scale-up Workflow

Run script

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages