A Generalized Deep Learning-Based Pipeline for Historical Manuscript Dating

In this work, a deep learning pipeline is proposed that learns to date manuscripts based on handwriting.

To run scripts install all Python requirements first:

pip install -r requirements.txt

Go to the src folder. The following command then runs the pipeline for an image (e.g., test.ppm):

python dating_pipeline.py --img test.ppm --dataset mps

--img specifies the path of the image and --dataset the dataset weights to use (can be mps, clamm, scribble, or himanis). Weights will automatically be downloaded from huggingface (https://huggingface.co/nikolai40/deep-dating).

To remove comments from the CLaMM dataset see: https://github.com/NikolaiHerrmann/comment-remover

The pipeline will produce a plot similar to this:

Datasets

The datasets used in this project:

MPS: https://zenodo.org/records/1194357
CLaMM: https://clamm.irht.cnrs.fr/icdar-2017/download/
ScribbleLens: https://openslr.org/84/
Himanis: private

Code overview

Directories

augmentation: augmentations using the imagemorph (https://github.com/GrHound/imagemorph.c) program (didn't end up using it) and the augraphy library (https://github.com/sparkfish/augraphy).
datasets: loads all datasets, see dating_util.py and dating_dataset.py to set paths of the datasets
metrics: calculates the cumulative score (CS), mean squared error (MSE) and mean absolute error (MAE)
networks: architectures for networks: BiNet and Inception-ResNet-v2
prediction: scripts to call trained models for prediction
preprocessing: patch extraction and other pre-processing methods
summary: generates all plots, figures and tables
util: util functions that are used through out

Main files

dating_pipeline.py: run pipeline (see how above)
image_stats.py: calculates the mean and standard deviation of images for a given dataset
patch_plot.py: get predictions on patch level (similar to dating_pipeline.py), can also produce saliency maps
plot.py: generate all or a specific plot
preprocessing.py: run pre-processing such as patch extraction
train.py: main start script to train all models. Trained models are put into a directory called runs_v2.
unet_pipeline.py: run the adapted BiNet model

Set the DATASETS_PATH variable in dating_util.py for the location of the datasets. Downloading the datasets, unzipping them, and then putting them into a datasets folder should be enough. See dating_dataset.py on how datasets are loaded, individual paths can also be changed there.

Debug

RasterFairy

Package may error out due to using np.float which is deprecated. In order to fix change line 134 in rasterfairy.py from gridPoints2d[q['indices'][0]] = np.array(q['grid'][0:2], dtype=np.float) to gridPoints2d[q['indices'][0]] = np.array(q['grid'][0:2], dtype=float).

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Generalized Deep Learning-Based Pipeline for Historical Manuscript Dating

Datasets

Code overview

Directories

Main files

Debug

RasterFairy

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Generalized Deep Learning-Based Pipeline for Historical Manuscript Dating

Datasets

Code overview

Directories

Main files

Debug

RasterFairy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages