Skip to content

Add three historical OCR transcription questions to HistBench#2

Open
jvpoulos wants to merge 1 commit intoCharlesQ9:mainfrom
jvpoulos:historical_ocr
Open

Add three historical OCR transcription questions to HistBench#2
jvpoulos wants to merge 1 commit intoCharlesQ9:mainfrom
jvpoulos:historical_ocr

Conversation

@jvpoulos
Copy link
Copy Markdown

@jvpoulos jvpoulos commented Aug 8, 2025

This PR adds 3 new OCR transcription questions to HistBench, featuring historical manuscripts from different time periods, languages, and scripts. These questions test the agent's ability to transcribe historical handwriting.

New Questions Added

  1. Question 415: George Washington Manuscript (18th century)
    - Source: Washington Database, line 302-15
    - Language: English
    - Script: 18th-century cursive
  2. Question 416: Saint Gall Manuscript (9th century)
    - Source: Saint Gall Database, line csg562-003-02
    - Language: Latin
    - Script: Carolingian minuscule
  3. Question 417: Parzival Manuscript (13th century)
    - Source: Parzival Database, line d-287b-050
    - Language: Middle High German
    - Script: Gothic

Dataset Sources & Citations

  • Washington Database: A. Fischer et al. (2012). "Lexicon-Free Handwritten Word Spotting Using Character HMMs," Pattern Recognition Letters, 33(7), 934-942.
  • Saint Gall Database: A. Fischer et al. (2011). "Transcription alignment of Latin manuscripts using hidden Markov models," Proc. 1st Int. Workshop on Historical Document Imaging and
    Processing, 29-36.
  • Parzival Database: A. Fischer et al. (2009). "Automatic transcription of handwritten medieval documents," 15th Int. Conf. on Virtual Systems and Multimedia, 137-142.
  • Attention Networks: J. Poulos & R. Valle (2021). "Character-based handwritten text transcription with attention networks," Neural Computing and Applications, 33(16), 10563-10573.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant