diff --git a/docs/5. Integrations/Speech-to-text and Text-to-speech in Glific.md b/docs/5. Integrations/Speech-to-text and Text-to-speech in Glific.md deleted file mode 100644 index 842e0ac9d..000000000 --- a/docs/5. Integrations/Speech-to-text and Text-to-speech in Glific.md +++ /dev/null @@ -1,150 +0,0 @@ -

- - - - - - -
6 minutes readLevel: AdvancedLast Updated: April 2026
-

- -# Speech-to-Text and Text-to-Speech Capabilities in Glific - -This integration in Glific enables NGOs and organizations to offer real-time translation and transliteration in various Indian languages, ensuring effective communication with end users in their preferred languages. Webhook names still reference "Bhashini," but since January 2026, Google Gemini 2.5 Pro has been the provider for both speech-to-text and text-to-speech capabilities. - ---- - -### This integration can be especially useful in use cases such as: - -- Translating chatbot content for multilingual campaigns. -- Enabling users to respond in regional languages. -- Transliteration helps convert text from one script to another, for example: writing Hindi words using English letters. - - - -## Steps to use Speech to Text in Glific Flows - -`Speech-to-Text (STT)` function in Glific can be used to convert user-recorded audio messages into text. This is especially helpful when users prefer speaking over typing, or in cases where typing in local languages is difficult. - -#### Step 1: Create a `Send message` node directing users to send their responses as audio messages, based on their preference. - -#### Step 2: In the `Wait for response` node, select `has audio` as the message response type. Also, give a Result Name. In the screenshot below, `speech` is used as the result name. - -Screenshot 2025-08-10 at 12 10 35 AM - -#### Step 3: Add a `Call Webhook` node. This is where we integrate the Bhashini service. - -- By default, `Function` would be selected. Leave this as it is. - -Screenshot 2025-12-03 at 9 13 38 AM - -- In the `Function` field, select the predefined function name `speech_to_text_with_bhasini`, from the dropdown. This function calls Gemini 2.5 pro model for converting audio to text. - -Screenshot 2025-12-03 at 9 14 39 AM - -- Give the webhook result name - you can use any name. In the screenshot example, it’s named `bhashini_asr`. - -Screenshot 2025-12-03 at 9 14 59 AM - -#### Step 4: Click on `Function Body` (top right corner). You would see the following. - -Screenshot 2025-12-03 at 9 15 55 AM - -Add the parameters as shown in the screenshot below. - -Screenshot 2025-08-10 at 12 14 30 AM - -- `speech` : It should be updated with the result name given for the audio file captured. In this example, the variable is named `speech` (Step 2), hence the value is `@results.speech.input` (If the audio note captured was saved as `query`, then the value will be `@results.query.input`) -- `contact` : Keep the value as given in the screenshot below - `@contact` - -#### Step 5: Once the webhook is updated, you could always refer to the translated text as `@results.bhashini_asr.asr_response_text` to use it inside the flow. -Add a `Send Message` node and paste this variable to show the converted text to the user. - - -Screenshot 2025-09-25 at 12 46 56 AM - - -[Sample Flow](https://drive.google.com/file/d/1F5oJGRxE7G6RgpyG77q2srqnikUZMDab/view?usp=sharing) Click on the Sample Flow link to import it and explore how it works. - ---- - -## Steps to Integrate Text To Speech in Glific Flows - -Text-to-Speech (TTS) function in Glific can be used to generate a voice note for any text message, whether it's typed by the end user or written by NGO staff. This allows organizations to make information more accessible, especially for end users who prefer audio over text. - -Screenshot 2025-09-25 at 12 51 19 AM - - -#### Step 1: Create a `Send Message` node asking users to reply in text if they prefer. - -#### Step 2: In the `Wait for Response` node, select `has only the phrase` as the message response type. Also, give a Result Name. In the screenshot below, `result_3` is used as the result name. - -Screenshot 2025-08-10 at 12 27 34 AM - -#### Step 3: Create a 'Call Webhook' node. - -- By default, `Function` would be selected. Leave this as it is. - -Screenshot 2025-12-03 at 9 20 23 AM - -- In the `Function` field, select the predefined function name `nmt_tts_with_bhasini` from the dropdown. This function calls Gemini 2.5 pro model for converting text to audio. - -Screenshot 2025-12-03 at 9 21 20 AM - -- Give the webhook result name - you can use any name. In the screenshot example, it’s named `bhashini_tts`. - -Screenshot 2025-12-03 at 9 21 53 AM - -#### Step 4: Click Function Body (top right corner). You would see the following. - -Screenshot 2025-12-03 at 9 22 32 AM - -Add the parameters as shown in the screenshot below. - -Screenshot 2025-08-10 at 12 34 03 AM - -- `text` : It should be updated with the result name given for the response/query provided by the user. -- `Source_language` : The original language of the text -- `target_language` : The language in which the voice note will be generated -- If translation is not needed, keep both `Source_language` and `target_language` the same. -- Supported Target Languages: `"tamil" "kannada" "malayalam" "telugu" "assamese" "gujarati" "bengali" "punjabi" "marathi" "urdu" "spanish" "english" "hindi"` - -#### Step 5: Create a `send Message` node and paste the variable. - -`@results.bhashini_tts.media_url` for the voice input. `Bhashini_tts` is the webhook result name used in the given example. - -- Go to `Attachments` in the `Send Message` node -- Select `Expression` from the dropdown. -- Use the following expression: `@results.bhashini_tts.media_url` - -Screenshot 2025-09-25 at 12 56 50 AM - -Please note: In order to get the voice notes as outputs, the Glific instance must be linked to the Google Cloud Storage for your organization. This is to facilitate storage of the voice notes generated by Bhashini as a result of the webhook call. To set up Google Cloud Storage [click here](https://glific.github.io/docs/docs/Pre%20Onboarding/Google%20Cloud%20Storage%20Setup%20-%20GCS) - -#### Step 6: To get the translated text out, create another send message node, and call the `@results.bhashini_tts.translated_text`. - -Screenshot 2025-09-25 at 12 57 47 AM - -[Sample Flow](https://drive.google.com/file/d/1WCOLQMF-OgLVR7PNHXbggMSeDXMJbui7/view) Click on the Sample Flow link to import it and explore how it works. - - -## Using OpenAI Speech Engine for Text-to-Speech - -Apart from Gemini, OpenAI can also be used as the speech engine to generate text-to-speech (TTS) responses. This is another alternative; users can try both options to see which gives better results for their audience and language preferences. - -### How to configure: -- In the `Function Body`, set the speech engine to `open_ai`. - -Please note that, currently this alternative is supported only when both **the source and target language are the same**. - -- Keep the remaining steps the same as those mentioned in the Text-to-Speech section above. - -Screenshot 2025-12-31 at 4 42 02 PM - ---- - -### Video of Showcase - -[Video Link](https://www.youtube.com/watch?v=zS83U9OJJzk) - -_Watch from 25 minute mark to watch the Bhashini integration part_