A rather traditional visualization of linguistic data is the practice of interspersing bits of data in descriptive texts, most obviously perhaps as examples formatted as Interlinear Glossed Text. Other examples of data in text include forms, either in running text or in a table, or just references.
To support this use case, the cldfviz.text command can fill data from a CLDF dataset into a markdown
document, where references to CLDF data objects (rows of tables or complete tables) are marked using
markdown links.
While this process may seem overly complex - after all, you could just type an IGT example into your document - there are advantages:
- Valuable structured data can be made available as CLDF data, thus opening it up for re-use (and various attempts to extract IGT from text documents are proof that this content is considered valuable).
- Rendering of structured data can be made more uniform and automatic - thereby simplifying authoring of such documents.
Basically, we want to have data in two places, and the better authoritative place for it seems to be the CLDF dataset.
With cldfviz.text we want to make it easier to "pull data back into text".
CLDF Markdown is specified as CLDF extension
since CLDF 1.2. cldfviz.text provides a customizable CLDF Markdown renderer, which converts CLDF Markdown
to regular markdown, with CLDF Markdown links replaced by rendering Jinja templates.
Rendering of data objects is controlled with templates using the
Jinja template language. Sometimes, templates can be parametrized,
e.g. to choose only cognates belonging to the same cognate set from a CognateTable. These parameters can
be specified as query string of the reference URL, e.g.
[cognateset X](some/path/CognateTable?cognatesetReference=X#cldf:__all__)
In addition to data objects you can also specify
- maps to be created with
cldfviz.mapand - trees to be created with
cldfviz.tree
for inclusion in the resulting markdown document; e.g.:

An example of a document rendered with cldfviz.text is text_example/README.md,
several paragraphs of WALS' chapter 21, rewritten in
"CLDF markdown" and rendered by "filling in" data from
WALS as CLDF dataset.
Any object in a CLDF dataset may reference sources. These references are typically rendered by the templates associated with the object type. Thus, when a complete CLDF markdown document is rendered, it may be desireable to include a list of all cited sources. This is supported as follows:
- To insert a list of cited references in the document, add the link
at the location where the list should appear.
[References](Source?cited_only#cldf:__all__) - Discovery of cited sources relies on the references being rendered as links. Thus, it is necessary that all
CLDF markdown links in the document are specified adding the
with_internal_ref_linkURL parameter. - If
Sourceinstances are referenced directly (rather than implicitly as reference for another object), therefURL parameter needs to be supplied, e.g.see [Meier 2012](Source?ref&with_internal_ref_links#cldf:Meier2012). - To support flexible link labels, e.g. for cases like "Meier (2012, 2013)", the label of the CLDF markdown
link can be kept in place (and not be replaced with a label generated from the source metadata), by providing
the
keep_labelURL parameter, i.e. with markdown likeMeier ([2012](Source?ref&with_internal_ref_links&keep_label#cldf:Meier2012), [2013](Source?ref&with_internal_ref_links&keep_label#cldf:Meier2013))
Sometimes it is desirable to render an object by just displaying a particular property, e.g. to
display the name or description of a Parameter as document title. This can be done using the
property.md template:
# [](ParameterTable?__template__=property.md&property=name#cldf:param1)
$ cldfbench cldfviz.text -h
usage: cldfbench cldfviz.text [-h] [-l] [--text-string TEXT_STRING]
[--text-file TEXT_FILE] [--templates TEMPLATES]
[--output OUTPUT]
DATASET
positional arguments:
DATASET Dataset specification (i.e. path to a CLDF metadata
file or to the data file)
optional arguments:
-h, --help show this help message and exit
-l, --list list templates (default: False)
--text-string TEXT_STRING
--text-file TEXT_FILE
--templates TEMPLATES
--output OUTPUT-
Rendering a parameter description from a CLDF StructureDataset:
$ cldfbench cldfviz.text --text-string "[](ParameterTable#cldf:D)" tests/StructureDataset/StructureDataset-metadata.json **Oblique Nouns** Description: Oblique stems of common nouns Codes: - 0: 0 - 1: 1 - 2: 2
-
LGR
$ cldfbench cldfviz.text https://raw.githubusercontent.com/cldf-datasets/lgr/main/cldf/Generic-metadata.json --text-string "[](ExampleTable#cldf:1)" > (1) Indonesian (Sneddon 1996: 237) <pre> Mereka di Jakarta sekarang. They in Jakarta now ‘They are in Jakarta now.’</pre>
-
Maps and trees: Including maps and trees in CLDF Markdown can be very powerful. But as soon as it comes to customization, the nested formatting - e.g. passing a
toytreestyles dictionary as URL parameter - makes the process fragile. But as a proof-of-concept example you may look at the markdown source ofwals_algic_tmpl.mdand the markdown file created from it runningcldfbench cldfviz.text wals-2020.3/ --text-file wals_algic.md
While CLDF is designed to make merging data from multiple datasets easy, one may still want to reference data from multiple datasets in the same CLDF markdown document. This is supported as follows.
The cldfbench cldfviz.text command accepts multiple datasets (specified by metadata file). To disambiguate
references, the URL fragment in reference links must also specify the dataset
- either by position in the argument list (starting from "1")
- or using an identifier assigned to the dataset (by appending it separated by
#to the metadata file).
So for example, you can print sources from two CLDF datasets via
$ cldfbench cldfviz.text \
peterson:tests/StructureDataset/StructureDataset-metadata.json list:tests/Wordlist/Wordlist-metadata.json \
--text-string "[](Source#cldf-peterson:Peterson2017) [](Source#cldf-list:List2014e)"
Peterson, John. 2017. Fitting the pieces together - Towards a linguistic prehistory of eastern-central South Asia (and beyond). Journal of South Asian Languages and Linguistics 4. Walter de Gruyter {GmbH}.
List, J.-M. and Prokić, J. 2014. A benchmark database of phonetic alignments in historical linguistics and dialectology. In Calzolari, Nicoletta and Choukri, Khalid and Declerck, Thierry and Loftsson, Hrafn and Maegaard, Bente and Mariani, Joseph and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios (eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation, 288-294. Reykjavik, Iceland: European Language Resources Association (ELRA).Similarly, for cldfviz.map references, use cldfviz.map-<id> as URL fragment in the relevant links.
There are several other efforts towards making Markdown a viable authoring platform for linguistic texts:
- Florian Matter's pylingdocs provides an authoring platform on top of CLDF Markdown
- Taras Zakahrko's pandoc-glossify provides a Pandoc extension to render interlinear glossed text
- Michael Cysouw's pandoc-ling also provides a Pandoc extension to render interlinear glossed text (testifying to the usefulness of Pandoc as cross-format authoring platform).