Skip to content

Corrections to our phonetic dictionary

License

Notifications You must be signed in to change notification settings

WISCLab/dictfix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dictfix

DictFix.txt: Corrections to our phonetic dictionary.

As part of our intelligibility computation pipeline, we convert English words into their pronunciations using an automatic grapheme-to-phoneme conversion step. We can provide the converter with a list of "dictionary fixes" to specify how to convert certain words. This repository houses that list of exceptions in the file DictFix.txt.

Notes

When I update the list, I just put the new entries on the bottom and then sort the file. The included bash script sort.sh sorts the dictionary file, checks for duplicate entries with conflicting pronunciations, and validates the format of the entries.

./sort.sh

One can also just sort and deduplicate with the built-in sort.

sort --unique DictFix.txt --output DictFix.txt

To download the latest dictionary from GitHub, use the following:

curl -s https://raw.githubusercontent.com/WISCLab/dictfix/master/DictFix.txt > DictFix.txt

Extras

The directory mfa includes an additional set of pronunciation fixes for the dictionary provided by the Montreal Forced Aligner. It has its own README.md file.

The script util-delete-empty-rating-files.sh is a helper for dealing with listening experiment data downloaded from our lab's database. It deletes a listener rating files when they contain 0 rows. (They should not contain 0 rows --- so it's a sign that data is missing somewhere.) It's useful to have around when running ShowAndTell on a large number of files.

About

Corrections to our phonetic dictionary

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages