Skip to content

[FEATURE] Add support for voice to text when using telegram or any of the channels #433

@edenreich

Description

@edenreich

Summary

Telegram supports a voice message, I would like that the voice message is being transcribed into text when downloaded and passed to the LLM as text so the agent can reply back.

This helps when for example you are on the go and you don't have time to type something so you can just send a voice message.

Check the different API's available for Speech to Text (STT) and offer perhaps 2 different integrations.

Compare for example Deepgram vs. AssemblyAI or any other providers out there, the focus is on english, we don't need a rich language model that supports too many languages, let's keep it simple.

Acceptance Criteria

  • Voice messages are supported and are being transcribed
  • The agent retain the voice messages locally (configured with cleanup - for example retain only the last 10 voice messages)
  • The LLM returns a reply when a voice message is submitted
  • It's tested
  • It's documented

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions