A template project for those wanting to create an index locorum for their publications.
See INSTALL.md.
This template comes with batteries included, but you will have to adapt a bit the configuration. The project configuration file is in config/project.ini.
Make sure you change the path for the following settings:
preproc.treetagger_home: the path toTreeTagger
This template project comes a short example document, i.e.Bryn Mawr ClassicalReview 2013-01-10.
The documents to be processed need to be placed in the sub-folder orig within your working directory.
In this example project general.working_dir = ./data, thus the input files are placed in ./data/orig/. The script will then create further subfolders to store temporary or intermediate files.
When you install the CitationExtractor (version >= 1.7.0) the bash command citedloci-pipeline will be automatically installed in your system, which allows you to run the pipeline.
For a detailed explanation of each pipeline step, please refer to the Jupyter notebook step-by-step.ipynb.
citedloci-pipeline do preproc --config=config/project.iniAt this point you should have a tokenized and PoS-tagged file at data/iob/bmcr_2013-01-10.txt (if you've kept the default project settings).
Try:
cat data/iob/bmcr_2013-01-10.txtcitedloci-pipeline do ner --config=config/project.iniAt this point you should have a JSON file with entities annotated at data/json/bmcr_2013-01-10.json.
Try:
# requires jq, see https://stedolan.github.io/jq/download/
cat data/json/bmcr_2013-01-10.json|jq ".entities"citedloci-pipeline do relex --config=config/project.iniAt this point you should have a JSON file with relations annotated at data/json/bmcr_2013-01-10.json (it overwrites the previous one).
Try:
# requires jq, see https://stedolan.github.io/jq/download/
cat data/json/bmcr_2013-01-10.json|jq ".relations"citedloci-pipeline do ned --config=config/project.iniAt this point you should have a JSON file with entities disambiguated at data/json/bmcr_2013-01-10.json (it overwrites the previous one).
Try:
# requires jq, see https://stedolan.github.io/jq/download/
cat data/json/bmcr_2013-01-10.json|jq ".entities"