Month: August 2018

Text to speech services analysis

Browsing Facebook, Roger Kibbe (voicecraft.ai), shout out to Roger, asked for voices text to speech services that have more personality. Just in yesterday’s episode about internationalization, we talked about the customization of voices in different languages and accents with Text to Speech services. But when it’s time to put real emotions it might a little harder to find the voices we want for our apps. Synthetic voices or text to speech services might come in two fundamental ways of consumption: one is download model and the other is streaming. Pricing schemes might be per minutes, per requests in the case of streaming, a combination of both or flat pricing. So here is a compilation of voice services, their offerings and prices:

Text to speech services comparison

Cepstral: It’s a little pricey, so it’s probably for established companies.  Their demo has tons of voices that you can customize by pitch and rate and add effects like Space Robot or Split personality.

Acapela is a Belgium text to speech company. Acapela voices demo, the Acapela Box has a collection of voices described as happy, bad guy, old man or child, among others. Acapela offer a service of voice banking, preserving your own voice as synthetic speech. Other offerings include the creation of voices, a service for companies to differentiate through vocal dimension into their marketing strategy with an identifiable corporate sound. Pricing its based on a credit model where credits correspond to the length of the text – roughly the number of characters – for premium voices. Prices start from 6 Euros for 47 seconds of audio to 600 for 96 minutes of audio.

Voicery the thing I didn’t love about this service is they only have a versions of English voices, given the current state for voice apps, i really prefer a service where i can choose from a range of voices with different languages. On the other hand, voices did sounded quite real. As Acapela, Voicery provides the creation of voices and rights for companies to create their own voice. This is a streaming service and doesn’t offer support for on-device synthesis. They have two pricing packages: the enterprise, where prices are provided on request and the starter package up to 100 request per second at 0.001per character.

SpeechMorphing I heard about SpeechMorphing at Voice Summit and it seems to have a high quality service. They don’t have demo voices, you need to request a demo, so I’ll get back to this one in later episodes. It does shows that you can customize the style of the voices, promising.

Cereproc offers a streaming service, an SDK for developers. Really well crafted voices, the one with more voices demoing, pretty realistic at a reasonable price of 1.000.000 characters for 124 a month up to almost 500 per month. As other services Cereproc offers voice creation and voice cloning as well. Voices available in English, Dutch, French, Italian, Spanish, Portuguese, Japanese and other languages.

Talestreamer: High quality and best relation quality-price with plans of 4/month for 25 000 requests and 16/month for 1,000,000 requests. Talestreamer is the service behind The Magic Door, in my opinion one of the best voice applications out there.

Lastly, Amazon Polly,  Amazon synthesize speech service. You can try it for free customizing it with SSML. It’s a great service for your voice apps or custom audios, but if you need something with more personality, or create your own voice for a brand identity, then you better stick with one of the others.

Winners are Talestreamer it has really good voice at a reasonable price. And Cereproc with voices in a lot of languages providing an SDK plus streaming service at a reasonable price but also you can create more custom voices if needed. As always when choosing a service, it will depend on your needs.  

Do you have a text to speech service to recommend? Have you tried any of these? Shut us a message @voicefirstlabs on twitter. In the episode notes at voicefirstweekly.com/flashbriefing/63 you can find the full transcript of the episode plus the links to each service mentioned.

Before wrapping up, I want to invite you to subscribe to our weekly newsletter. This morning we send our issue number 15! Time flies. Thank you for listening, you have a great day and we will talk tomorrow!

Internationalization on voice applications

Thank you Mathew for presenting the show today, welcome to this very special episode of VoiceFirst Weekly.

Availability in different languages has been news a lot recently for the popular smart assistants

During the past months availability of either Google Assistant and Alexa to different languages and regions has been constant news. The race to win in every language has never been greater. The website era will look so updated in a few years. Text to speech advances will allow and soon people will expect applications to be available in every language possible. And let’s be clear that the biggest companies in the world today are available in dozens of languages. This trend is only going to grow and be expected.

You should not have to learn English to use the internet. The next billion users expect more content in their languages.

Caesar Sengupta, VP,  Next Billion Users Team at Google

That’s all good Mari, tell me more

So here’s what I think you should do, design so it can be adapted to various languages and regions with as little as possible engineering effort. This is probably the easiest part in the case of both Amazon Alexa and Google Assistant as they offer options to add languages and locales. You don’t need to have tons of messages files in your code. As long as you have the text translated it works. But that’s not enough. Working with Disney Studios taught me the landmark work and relevance of distributing content for almost every country on earth. And I learned to think upfront on internationalization not as a feature to be added later. It’s not only engineering work.

Internationalization requires 3 fundamentals:

  • Language translation
  • Culture translation, the translation linked to culture, like dialects, food, architecture and even jokes.
  • And Distribution. You might not know in the beginning all of the translations you want for your voice app, but it’s safe to assume you’ll eventually want it in at least one more language.

Be sure to account for all 3 fundamentals when planning, designing and writing your voice application. For voice apps on top of Google’s or Amazon’s smart assistants, distribution is embedded in the platforms, and you should probably focus more on discoverability and marketing in those locales.

Synthetic voices

The challenge for voice applications is similar in several regards to those of distributing animated movies and series. With Amazon Polly, Google Voices, Talestreamer the key is to adapt the tone for the language, culture and locale and to adapt them to the users and application context. Be mindful and respectful when translating.  Synthetic voices add a layer of communication style along with SSML, the speech synthesis markup language, a standard for styling your voices.

This is a briefings show, and episodes are short

There’s so much to cover in this topic that one episode is just not enough! It doesn’t help that this is something I’m very passionate about.

What do you do if you don’t know any other language? Thank Twitter, go there and ask for help. The voice community is incredibly helpful.

Find in the notes of this episode at voicefirstweekly.com/flashbriefing/62, Youtube and Google Assistant sounds effects, Amazon Polly synthetic speech and guides on how to add internationalization to your Google Actions and Alexa skills. I added some skills that are available in several languages and are rocking it!

Thank you for listening and we’ll talk tomorrow

P.D There is an emerging interesting category called synthetic media, stay tuned for future flash briefings in this subject. Meanwhile, Matt Hartman discuss the topic with frequency at his newsletter Hearing Voices.

Resources

Amazon Polly synthetic voices

https://console.aws.amazon.com/polly/home/SynthesizeSpeech

Google sound library

https://developers.google.com/actions/tools/sound-library/

Internationalization in Google Actions

https://medium.com/orbismobile/hello-salut-hola-internationalization-in-your-google-actions-772a63989c10

Build a multilanguage Alexa skill

https://developer.amazon.com/blogs/post/Tx2XUAQ741IYQI4/how-to-build-a-multi-language-alexa-skill

Alexa Voice Service Prepare for internationalization

https://developer.amazon.com/docs/alexa-voice-service/prepare-for-internationalization.html

Youtube sounds effects

https://www.youtube.com/audiolibrary/soundeffects

What’s next on conversational AI

Definition time, what’s conversational AI?

It’s related to systems you can interact in a conversational-like manner. It’s the need for humans and computers to interact more naturally. Even though this might look like natural progression of what we will do with our machines, this represents a major shift in computing. So the first step in defining conversational AI is recognize that we are entering a completely new realm. Thus when you see alarming news that your Alexa or Google Home can’t do this or the other, or your chatbot it’s not transforming your company customer service, it’s because consumers are just starting to notice this interfaces, we are all learning how to build for better conversations, and how to interact with this devices. Maybe the problem is how we are labeling the technology, because despite the tremendous progress natural language processing, speech to text transformation and machine learning has made in the past decades, we can not say it’s conversational. It’s misleading. These limitations expose the challenges of natural language understanding. We know where we are know, people are interacting everyday more with smart assistants. China’s Baidu Dueros has reached more than 100 million devices.  

But what’s next? Enter the VentureBeat article:

According to Martin the first thing that has to come next for conversational AI is new tools beyond machine learning. Natural language generation should be the focus of the next set of tools in conversational space.

The second is higher fidelity conversations. This involves wide and deep conversations, personalization and multimodality. This should require tracking state of previous interaction and identification of individuals likes and styles.

Lastly, Martin points out the next challenge is finding the right role for humans in the loop. And i couldn’t agree more to this one, in favor of keeping a human on the loop in the dialog, at least initially. And as a nice comparison he brings Westworld, where there is a narrative department driving the hosts dialogs and personality. When I was asked by Teri for his podcast Alexa in Canada, what I thought is next in voice at Voice Summit I replied that this is the platform for the creatives. Well crafted conversations won’t be drived by programmers but by writers and script writers and those connoisseurs of the human condition.

This is what’s coming in conversational AI: tools for language generation and dialog management systems, higher fidelity conversations and humans in the loop. And who knows maybe Westworld seamless conversations are not that far off.

Thank you for listening, and we’ll talk tomorrow.

What you need to know to start your week in voice

What are the 4 important news you need to know to start your week in voice?

Amazon Alexa SDK for auto

Good morning, happy Monday. I wish you all have a great and productive week ahead. Last week Amazon announced their SDK for developers to develop applications on Alexa for autos. The SDK is available for developers on github. The SDK includes support for streaming media, smart home controls, weather reports, and Alexa’s many skills. It’s the first time developers will be able to get a first look at how Amazon wants to integrate Alexa into in-vehicles. It will take some time for developers to get familiar with Alexa Auto, so we’ll likely see more cars ship with Alexa in 2019 and beyond. What we are seeing now it’s Amazon moving to every possible context where their smart assistant Alexa might possible be. And as rumour has it, they might be even trying to release an smart phone. Alexa in cars will also be competing with Andoid Auto and Apple’s CarPlay for the attention of the automotive industry. According to a study by SEO Tribunal, the car it’s the places where users use voice search the most. So it’s expected this would be a battle ground for companies to dominate.

Samsung unveils their smart speaker: Galaxy Home

Samsung did some unveiling this past week, among which is their smart speaker Galaxy Home, landing with Bixby as smart assistant. It was shown, but we don’t know when the is actually launching.  The design is somewhat different that the other smart speakers in the market, and some Twitter went full on it, some Twitter comments said it’s similar to a grill, a spaceship and a portal entry. Pretty far off comparisons. Design specificities apart, it seems to be a high end device with several speakers and bigger than the competitors. On the music streaming service,  Samsung is gonna ride Spotify from it’s phone to its refrigerators. I’m interested to see how their mobile Bixby assistant it’s received by users compared by Google.

Google released audio news, a functionality similar to Alexa’s flash briefings

Following their Lenovo launch with the first smart display for the Google Assistant, the company announced that users can now get video or audio news briefings to catch you up on headlines. For now, the update is available only in the US and not open to the general public. According to their blog post, they will be learning from U.S to expand further. I do hope they open to the general public. It’s good they came forward with a feature that’s growing in popularity every day for users worldwide. I can’t wait to delete the code we have now to make this briefing available on Google Assistant.

We are now part of VoiceFirst.fm

Last news, but not less important, on Saturday we announced that we are now part of Voicefirst.fm, along with This Week in Voice, Voice in Healthcare, Voice Marketing and other important podcasts in the voice ecosystem. We are thrilled to come onboard to the VoiceFirst.fm family and to build this partnership.

Thank you for listening, have a great day and week, and we’ll certainly talk tomorrow.

Content flow in smart speakers

Maybe you don’t need an app to be in voice platforms, you can just repurpose your content for smart assistants platforms. This is one of the things VoiceFirst Labs does for companies, repurpose their existing content for smart speaker platforms to reach millions of users. Google actions offer a way to do just that. When we were looking for the best way to put our audio content in Google Actions for VoiceFirst Weekly, I was appalled by the fact that it didn’t exist a simple way like flash briefings in Alexa to do it. This was later confirmed by several more people that have asked me how to do a flash briefing for Google home and the Assistant. For a full explanation of how we did it, we published an episode called Sneak peek into VFW tools and processes that you can check out. But fear no more, there are ways to be in actions with ease.
They are called content-based actions. For podcasts, recipes and news publishers if you a structured data markup and accelerated mobile pages in Google, they will automatically create actions for you with a corresponding auto-generated page in the assistant directory. Pretty neat.
If you follow the instructions in each of the links I’m gonna let you in this episode notes for each type of content, you’ll have your action in no time. Unfortunately, for audio content, you have to contact them, which I did and still haven’t heard back. Will let you know when I do.
In the case of Alexa, the easiest way to have content up and running are flash briefings. The advantage is that you can provide either text to be read by Alexa or an audio source. When you are providing text it’s important not to pass your blog post as is, because Alexa gets funny and confusing when reading lots of statistics or links. So that’s an important detail to have into account. And for audio in Alexa, If you already have a podcast, it’s easy enough to pass the URL where you can get the latest audio published. Ideally, flash briefings are not that long, but I listen to a lot of briefings with a variety of length. Summarizing: there are two main ways to have a content flow for smart assistants platforms: One is through text, for Google assistant users can read it as cards in their phones and for Alexa, it will read it out loud. The other is audio, whether is a podcast or audio recordings.
What are you doing today for having your content on smart platforms? Content is no longer king guys, context is, make your content available in any possible context consumer might be in.
Thank you for listening!

Relevant links for content creation:

Google Actions
Content based actions in Google Actions

Podcast action

Recipe action

News action (text based)

Amazon Alexa

Flash briefings