Yiddish texts for training TTS

A collection of Yiddish texts paired with audio recordings

Repo structure:

catalog.csv: A spreadsheet containing bibliographic information and links
txt/: Text files; these will be periodically updated with corrections and moved into /txt/for_training/
pdf/: Original PDF versions of the texts

Running python dl_and_segment.py --download --segment --gen_lexicon --purge will do the following steps:

Download audio files for each of the texts that are marked in the catalog as having been hand-corrected.
Use aeneas to find the timestamps in the audio corresponding to each sentence in the text, and create segmented audio/text pairs. The texts will be in three versions: yivo_respelled (YIVO with precombined Unicode characters, with Hebrew/Aramaic-origin words respelled phonetically); yivo_original (YIVO with precombined Unicode chars, no respellings); hasidic (a version of yivo_original but respelled according to Hasidic orthographic norms, including the removal of all diacritics)
Create a lexicon (for each orthography) to be used with the Montreal Forced Aligner.
Purge audio files that are too short to be used with the MFA.
Finally, print some commands to the screen to train and run the MFA.

All of the files created by the above steps will be available in an untracked directory called generated/. Speaker codes are based on dialects, e.g., lit1, lit2 (for Lithuanian Yiddish), pol1 (for Polish Yiddish).

Running bash prep_dataset.sh will create a publishable TTS dataset (in generated/dataset/)

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
pdf		pdf
txt		txt
.gitignore		.gitignore
DATASET_README.md		DATASET_README.md
README.md		README.md
catalog.csv		catalog.csv
dl_and_segment.py		dl_and_segment.py
prep_dataset.sh		prep_dataset.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yiddish texts for training TTS

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Yiddish texts for training TTS

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages