This repository implements an Active Speech Recognition. This allows users to speak into their mic and publish the transcribed text via a ZMQ socket.
ASR is primarily dependent on the following:
- Sounddevice: Microphone selection and audio input stream.
- Silero-VAD: Used to detect voice activity and determine end of utterance (EOU).
- WhisperX: Audio -> text transcription.
- Install the requirements by running the following
pip3 install -r requirements.txt - Configure ASR settings.
- Main configuration settings are located in
config/config.yaml- To use another configuration file, pass the file path using
--config
- To use another configuration file, pass the file path using
- Override configuration settings using flags. For more information run
python3 main.py --h
- Main configuration settings are located in
- Main file is located at
ASR/main.py. Run this file usingpython3 main.py - Begin speaking into the selected microphone.
- If you need to check if the text is being published, run
Examples/example_sub.py
- Allow configurable VAD and transcription models.
- Create interface for microphone selection/settings.
- Enable different data transportion architectures beyond standard PUB/SUB.