A collection of Praat scripts for acoustic analysis and speech manipulation, developed primarily for research on infant-directed speech (IDS).
| Script | Description |
|---|---|
syllable-nuclei-with-review.praat |
Detect syllable nuclei; compute speech and articulation rates with a manual review step |
extract_formants_IDS.praat |
Extract .wav files |
IDS_manipulation.praat |
Generate 8 factorial acoustic manipulations ( |
Detects syllable nuclei in a WAV file and calculates two speech rate metrics:
- Speech rate: syllables / total duration
- Articulation rate: syllables / phonation time (pauses excluded)
After automatic detection, the script opens the Praat editor so the user can manually review and correct both the syllable tier and the utterance/silence segmentation before results are written to a CSV file.
- Sonority envelope: intensity is computed on a bandpass-filtered copy of the audio (default 100–6000 Hz, adjustable in the form), reducing false positives from low-frequency noise and fricative energy.
- Pitch-based utterance validation (VAD): each interval classified as
utteranceis validated against the Pitch object; intervals without sufficient vocal-fold vibration are reclassified assilent. - Interactive WAV file selection via the system file chooser.
- Results are appended to a cumulative CSV in the same folder as the audio, so all recordings in a folder accumulate in a single output file.
- Compatible with Linux, Mac, and Windows.
- De Jong, N. & Wempe, T. (2009). Praat script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods, 41(2), 385–390. https://doi.org/10.3758/BRM.41.2.385
- Quené, H., Persoon, I. & De Jong, N. (2010). Modified version of the original script [version 2010.09.17]. https://github.com/FieldDB/Praat-Scripts/blob/main/praat-script-syllable-nuclei-v2dir.praat
Extracts .wav files, and writes one row per file to a CSV.
| Parameter | Description | Default |
|---|---|---|
directory |
Folder containing the .wav files |
— |
resultsfile |
Full path of the output CSV | — |
minimum_pitch / maximum_pitch
|
|
100 / 600 Hz (female) |
maximum_formant |
Formant analysis ceiling | 5500 Hz (female/children) |
- Default settings are tuned for female speakers. For male speakers, use
minimum_pitch = 75,maximum_pitch = 300, andmaximum_formant = 5000. - Undefined values (unvoiced or silent segments) are replaced with
0. - Formant analysis uses the Burg method (5 formants, 25 ms window, pre-emphasis from 50 Hz), matching the settings of Hilton et al. (2022).
- Adapted from
analysis/acoustics_processing/3_masterscript.praatin Hilton, C. B., Moser, C. J., et al. (2022). Acoustic regularities in infant-directed speech and song across cultures. Science, 378(6617). https://doi.org/10.1126/science.abm1720 - Original script repository: https://github.com/themusiclab/infant-speech-song
Generates all 8 combinations of a 2 (.wav, and a verification report with pre- and post-manipulation acoustic values is saved as a .txt file.
Manipulation magnitudes are set to 1 × the between-speaker SD of IDS acoustics in female speakers reported by Hilton et al. (2022, Nature Human Behaviour; n = 21 societies):
| Parameter | Offset | Default |
|---|---|---|
|
|
± n Hz | ±51 Hz (50.8 Hz) |
|
|
± n Hz | ±28 Hz (28.4 Hz) |
| Formants (F1–F4) | ± proportion | ±4.6% (CV = 0.046) |
-
Formant shift (Step 1): LPC-based resynthesis applied to the original audio. All formant frequencies are scaled proportionally by a ratio that is adjusted iteratively until the geometric mean of F1–F4 in the output matches the target within
ffreq_tolerance_hz. No PSOLA is used in this step; pitch is left untouched. -
$f_0$ manipulation (Step 2): Iterative affine correction via PSOLA. A linear transform (slope and intercept) is computed to shift$f_0$ mean and SD to their targets. After each resynthesis, the output SD is measured; if it deviates from the target by more thansd_tolerance_hz, the slope is corrected bytarget_sd / measured_sdand the process repeats. Typically converges in 2–3 iterations (maximummax_iter).
| Parameter | Description | Default |
|---|---|---|
Input_file |
Full path to the input .wav file |
— |
Output_dir |
Output folder (trailing / required) |
— |
sd_f0mean_hz |
|
51 |
sd_f0sd_hz |
|
28 |
factor_formant |
Formant scaling factor | 0.046 |
pitch_floor_hz / pitch_ceiling_hz
|
|
100 / 500 Hz |
sd_tolerance_hz |
Convergence tolerance for |
0.5 Hz |
max_iter |
Maximum iterations for |
5 |
ffreq_tolerance_hz |
Convergence tolerance for formants | 10 Hz |
max_ffreq_iter |
Maximum iterations for formants | 12 |
max_formant_synth_hz |
Formant ceiling for synthesis and measurement | 5500 Hz |
- 8
.wavfiles named{name}_f0mean-{High/Low}_f0SD-{High/Low}_Ffreq-{High/Low}.wav - 1
.txtreport named{name}_mean±{n}Hz_sd±{n}Hz_formant±{n}pct.txt
- Manipulation magnitudes: Hilton, C. B., et al. (2022). Acoustic regularities in infant-directed speech and song across cultures. Nature Human Behaviour, 6, 1545–1556. https://doi.org/10.1038/s41562-022-01410-x
- Formant manipulation adapted from
changeformants.praat, Praat Vocal Toolkit
Each script carries its own license header because they derive from different upstream sources:
| Script | License | Upstream |
|---|---|---|
syllable-nuclei-with-review.praat |
GPL-3.0-or-later | de Jong & Wempe (2008); Quené et al. (2010) via FieldDB/Praat-Scripts |
extract_formants_IDS.praat |
CC BY-NC-SA 4.0 | Hilton, Moser et al. (2022) via themusiclab/infant-speech-song |
IDS_manipulation.praat |
GPL-2.0-or-later | Original code; formant shift procedure adapted from Praat Vocal Toolkit changeformants.praat (verify license) |
Note:
extract_formants_IDS.praatis released under CC BY-NC-SA 4.0, which restricts commercial use. The other two scripts are GPL and permit commercial use. These licenses are not mutually compatible, so a single repo-wide license is not applied.