Summary
Telegram supports a voice message, I would like that the voice message is being transcribed into text when downloaded and passed to the LLM as text so the agent can reply back.
This helps when for example you are on the go and you don't have time to type something so you can just send a voice message.
Check the different API's available for Speech to Text (STT) and offer perhaps 2 different integrations.
Compare for example Deepgram vs. AssemblyAI or any other providers out there, the focus is on english, we don't need a rich language model that supports too many languages, let's keep it simple.
Acceptance Criteria
Summary
Telegram supports a voice message, I would like that the voice message is being transcribed into text when downloaded and passed to the LLM as text so the agent can reply back.
This helps when for example you are on the go and you don't have time to type something so you can just send a voice message.
Check the different API's available for Speech to Text (STT) and offer perhaps 2 different integrations.
Compare for example Deepgram vs. AssemblyAI or any other providers out there, the focus is on english, we don't need a rich language model that supports too many languages, let's keep it simple.
Acceptance Criteria