Rasa is an open source machine learning framework for automated text and voice-based conversations. Understand messages, hold conversations, and connect to messaging channels and APIs.
In a contextual conversation, something beyond the previous step in the conversation plays a role in what should happen next. For example, if a user asks "How many?", it's not clear from the message alone what the user is asking about. In the context of the assistant saying, "You've got mail!", the response could be "You have five letters in your mailbox". In the context of a conversation about outstanding bills, the response could be, "You have three overdue bills". The assistant needs to know the previous action to choose the next action.
This file contains the trained model of wav2lip+gan for lip sync.
The trail of this model was performed on google colab.
The model is giving great lip-sync results.
RESULTS OF WAV2LIP
https://drive.google.com/drive/folders/1Ww6DISQBdYbs1ojHBYe3mS-066Rdg7vz?usp=sharing
For the testing of model,we first created the avatar by using a FIRST ORDER MODEL
Then we applied wav2lip+gan on the output of FIRST ORDER MODEL to determine the quality of lip sync
Click here to go to my research work.
Click here For Python Code using IBM Watson.
There are several APIs available to convert text to speech in Python. One of such APIs is the Google Text to Speech API commonly known as the gTTS API. gTTS is a very easy to use tool which converts the text entered, into audio which can be saved as a mp3 file. The gTTS API supports several languages including English, Hindi, Tamil, French, German and many more. The speech can be delivered in any one of the two available audio speeds, fast or slow. However, as of the latest update, it is not possible to change the voice of the generated audio.
IBM Watson Text to Speech gives your brand a voice, enabling you to improve customer experience and engagement by interacting with users in their own languages using any written text. Increase accessibility for users with different abilities, provide audio options to avoid distracted driving, or automate customer service interactions to increase efficiencies.
Powerful real-time speech synthesis
Create Custom Voices
Convert text into natural-sounding speech using an API powered by Google’s AI technologies.
• Improve customer interactions with intelligent, lifelike responses
• Engage users with voice user interface in your devices and applications
• Personalize your communication based on user preference of voice and language
• Deploy Google’s groundbreaking technologies to generate speech with humanlike intonation. Built based on DeepMind’s speech synthesis expertise, the API delivers voices that are near human quality.
• Choose from a set of 220+ voices across 40+ languages and variants, including Mandarin, Hindi, Spanish, Arabic, Russian, and more. Pick the voice that works best for your user and application.
• Create a unique voice to represent your brand across all your customer touchpoints, instead of using a common voice shared with other organizations.
Train a custom voice model using your own audio recordings to create a unique and more natural sounding voice for your organization. You can define and choose the voice profile that suits your organization and quickly adjust to changes in voice needs without needing to record new phrases.
Take advantage of 90+ WaveNet voices built based on DeepMind’s groundbreaking research to generate speech that significantly closes the gap with human performance.
Personalize the pitch of your selected voice, up to 20 semitones more or less from the default. Adjust your speaking rate to be 4x faster or slower than the normal rate.
Customize your speech with SSML tags that allow you to add pauses, numbers, date and time formatting, and other pronunciation instructions.
With the help of Scipy, Numpy and Pillow libraries, the removal of image background is shown.
The trail of this model is performed on Google colab.
However, The result is not as we expected.
Foreground and background are not segmented correctly .
We are looking forward to different deep learning approaches like Modnet Architecture, Semantic segmentation to have the required output.
Team Edukut