In this work, a deep learning pipeline is proposed that learns to date manuscripts based on handwriting.
To run scripts install all Python requirements first:
pip install -r requirements.txt
Go to the src folder. The following command then runs the pipeline for an image (e.g., test.ppm):
python dating_pipeline.py --img test.ppm --dataset mps
--img specifies the path of the image and --dataset the dataset weights to use (can be mps, clamm, scribble, or himanis). Weights will automatically be downloaded from huggingface (https://huggingface.co/nikolai40/deep-dating).
To remove comments from the CLaMM dataset see: https://github.com/NikolaiHerrmann/comment-remover
The pipeline will produce a plot similar to this:

The datasets used in this project:
- MPS: https://zenodo.org/records/1194357
- CLaMM: https://clamm.irht.cnrs.fr/icdar-2017/download/
- ScribbleLens: https://openslr.org/84/
- Himanis: private
-
augmentation: augmentations using the imagemorph (https://github.com/GrHound/imagemorph.c) program (didn't end up using it) and the augraphy library (https://github.com/sparkfish/augraphy).
-
datasets: loads all datasets, see dating_util.py and dating_dataset.py to set paths of the datasets
-
metrics: calculates the cumulative score (CS), mean squared error (MSE) and mean absolute error (MAE)
-
networks: architectures for networks: BiNet and Inception-ResNet-v2
-
prediction: scripts to call trained models for prediction
-
preprocessing: patch extraction and other pre-processing methods
-
summary: generates all plots, figures and tables
-
util: util functions that are used through out
-
dating_pipeline.py: run pipeline (see how above)
-
image_stats.py: calculates the mean and standard deviation of images for a given dataset
-
patch_plot.py: get predictions on patch level (similar to dating_pipeline.py), can also produce saliency maps
-
plot.py: generate all or a specific plot
-
preprocessing.py: run pre-processing such as patch extraction
-
train.py: main start script to train all models. Trained models are put into a directory called
runs_v2. -
unet_pipeline.py: run the adapted BiNet model
Set the DATASETS_PATH variable in dating_util.py for the location of the datasets.
Downloading the datasets, unzipping them, and then putting them into a datasets folder should be enough.
See dating_dataset.py on how datasets are loaded, individual paths can also be changed there.
Package may error out due to using np.float which is deprecated. In order to fix change line 134 in rasterfairy.py from gridPoints2d[q['indices'][0]] = np.array(q['grid'][0:2], dtype=np.float)
to
gridPoints2d[q['indices'][0]] = np.array(q['grid'][0:2], dtype=float).