Google announced the general availability of Cloud text to speech, new audio profiles that optimize sound for playback on different devices, enhancement to multichannel recognition and more.
Actually, to show you how meta we run at VoiceFirst Weekly, starting yesterday Cloud text to speech will offer multilingual access to voices generated using WaveNet, a machine learning technique developed by Alphabet subsidiary DeepMind. And this bit is generated from a wavenet voice for English US. WOW
Clouds to speech offers 17 new wavenet voices, in the episode notes you can find the link to all the voices available and languages. Wait, there is more. The company also announced Audio profiles, which were previously available in beta.
Audio profiles let you optimize the speech produced by Cloud Text-to-Speech’s APIs for playback on different types of hardware. You can create a profile for wearable devices with smaller speakers, for example, or one specifically for cars speakers and headphones. It’s particularly handy for devices that don’t support specific frequencies; Cloud Text-to-Speech can automatically shift out-of-range audio to within hearing range, enhancing its clarity.
The other features that are part of the announcement are multichannel recognition, language auto-detect, and word-level confidence.
Multi-channel recognition offers an easy way to transcribe multiple channels of audio by automatically denoting the separate channels for each word.
Language auto-detect, which lets you send up to four language codes at once in queries to Cloud Speech-to-Text. The API will automatically determine which language was spoken and return a transcript, much like how the Google Assistant detects languages and responds in kind. (Users also get the choice of selecting a language manually.)
Word-level confidence, which offers developers fine-grained control over Google’s speech recognition engine. If you so choose, you can tie confidence scores to triggers within apps — like a prompt that encourages the user to repeat themselves if they mumble or speak too softly, for instance.
In the monthly free tier users will have up to 4 million characters for the standard voices and up to 1 million characters for the wavenet voices.
End of synthesize voice.
Doesn’t that sound amazingly real. This is Mari, your usual host. I wanted to give you a real scoop of the Google wavenet voices. I tried several Wavenet voices and as the alphabet was going up, I felt the voices were improving and felt more natural. This one is wavenet D. Synthetized voices that feel natural like this one will have a huge impact.
Maybe I’ll let wavenet-D to host more episodes! Did it sound natural to you? Let me know what you think.
Remember to like comment and subscribe and tell me what you will like to hear me talk about in future episodes. Our Twitter handle is voicefirstlabs and we are voicefirstweekly in Instagram, shut us a comment there to let me know what you think or anything else really, I love to interact about this space. My name is Mari, I’m will talk to you tomorrow.
Here are the resources for the new wavenet voices:
How to create audio
Here is the pricing
And this are the other voices: