Basic concatenative speech synthesizer in English.
Basic speech synthesizer to convert input text to a sound waveform containing intelligible speech. It uses a very simple unit selection and waveform concatenation system, with the acoustic units being individual recordings of diphones.
sympleaudio.pycontains anAudioclass that acts as an interface with the audio hardware, enabling operations such as saving, loading and playing.wavfiles.synth.pyis the main program and contains the following classes:Utterance: Performs basic word tokenization and normalization of the input text, converts words to phonemic transcriptions and extracts a sequence of diphones.Synth: Takes in a sequence of diphones, reads the contents of their corresponding.wavfiles intoAudioobjects, and concatenates them, allowing to play and/or save the output as another.wavfile.
- directory: path to the directory that contains the
.wavfiles for diphone sounds - diphones go from the middle of one speech sound to the middle of the next one, capturing the transition between both sounds.
--text: text to be synthesized.
--diphones: directory containing.wavfiles (./directoryby default).--play(default): play synthesized output.--save: output.wavfilename.
python synth.py --text "I sound like a robot" --save robot.wav --play