Episode Archives

Internationalization on voice applications

Thank you Mathew for presenting the show today, welcome to this very special episode of VoiceFirst Weekly.

Availability in different languages has been news a lot recently for the popular smart assistants

During the past months availability of either Google Assistant and Alexa to different languages and regions has been constant news. The race to win in every language has never been greater. The website era will look so updated in a few years. Text to speech advances will allow and soon people will expect applications to be available in every language possible. And let’s be clear that the biggest companies in the world today are available in dozens of languages. This trend is only going to grow and be expected.

You should not have to learn English to use the internet. The next billion users expect more content in their languages.

Caesar Sengupta, VP,  Next Billion Users Team at Google

That’s all good Mari, tell me more

So here’s what I think you should do, design so it can be adapted to various languages and regions with as little as possible engineering effort. This is probably the easiest part in the case of both Amazon Alexa and Google Assistant as they offer options to add languages and locales. You don’t need to have tons of messages files in your code. As long as you have the text translated it works. But that’s not enough. Working with Disney Studios taught me the landmark work and relevance of distributing content for almost every country on earth. And I learned to think upfront on internationalization not as a feature to be added later. It’s not only engineering work.

Internationalization requires 3 fundamentals:

  • Language translation
  • Culture translation, the translation linked to culture, like dialects, food, architecture and even jokes.
  • And Distribution. You might not know in the beginning all of the translations you want for your voice app, but it’s safe to assume you’ll eventually want it in at least one more language.

Be sure to account for all 3 fundamentals when planning, designing and writing your voice application. For voice apps on top of Google’s or Amazon’s smart assistants, distribution is embedded in the platforms, and you should probably focus more on discoverability and marketing in those locales.

Synthetic voices

The challenge for voice applications is similar in several regards to those of distributing animated movies and series. With Amazon Polly, Google Voices, Talestreamer the key is to adapt the tone for the language, culture and locale and to adapt them to the users and application context. Be mindful and respectful when translating.  Synthetic voices add a layer of communication style along with SSML, the speech synthesis markup language, a standard for styling your voices.

This is a briefings show, and episodes are short

There’s so much to cover in this topic that one episode is just not enough! It doesn’t help that this is something I’m very passionate about.

What do you do if you don’t know any other language? Thank Twitter, go there and ask for help. The voice community is incredibly helpful.

Find in the notes of this episode at voicefirstweekly.com/flashbriefing/62, Youtube and Google Assistant sounds effects, Amazon Polly synthetic speech and guides on how to add internationalization to your Google Actions and Alexa skills. I added some skills that are available in several languages and are rocking it!

Thank you for listening and we’ll talk tomorrow

P.D There is an emerging interesting category called synthetic media, stay tuned for future flash briefings in this subject. Meanwhile, Matt Hartman discuss the topic with frequency at his newsletter Hearing Voices.

Resources

Amazon Polly synthetic voices

https://console.aws.amazon.com/polly/home/SynthesizeSpeech

Google sound library

https://developers.google.com/actions/tools/sound-library/

Internationalization in Google Actions

https://medium.com/orbismobile/hello-salut-hola-internationalization-in-your-google-actions-772a63989c10

Build a multilanguage Alexa skill

https://developer.amazon.com/blogs/post/Tx2XUAQ741IYQI4/how-to-build-a-multi-language-alexa-skill

Alexa Voice Service Prepare for internationalization

https://developer.amazon.com/docs/alexa-voice-service/prepare-for-internationalization.html

Youtube sounds effects

https://www.youtube.com/audiolibrary/soundeffects

What’s next on conversational AI

Definition time, what’s conversational AI?

It’s related to systems you can interact in a conversational-like manner. It’s the need for humans and computers to interact more naturally. Even though this might look like natural progression of what we will do with our machines, this represents a major shift in computing. So the first step in defining conversational AI is recognize that we are entering a completely new realm. Thus when you see alarming news that your Alexa or Google Home can’t do this or the other, or your chatbot it’s not transforming your company customer service, it’s because consumers are just starting to notice this interfaces, we are all learning how to build for better conversations, and how to interact with this devices. Maybe the problem is how we are labeling the technology, because despite the tremendous progress natural language processing, speech to text transformation and machine learning has made in the past decades, we can not say it’s conversational. It’s misleading. These limitations expose the challenges of natural language understanding. We know where we are know, people are interacting everyday more with smart assistants. China’s Baidu Dueros has reached more than 100 million devices.  

But what’s next? Enter the VentureBeat article:

According to Martin the first thing that has to come next for conversational AI is new tools beyond machine learning. Natural language generation should be the focus of the next set of tools in conversational space.

The second is higher fidelity conversations. This involves wide and deep conversations, personalization and multimodality. This should require tracking state of previous interaction and identification of individuals likes and styles.

Lastly, Martin points out the next challenge is finding the right role for humans in the loop. And i couldn’t agree more to this one, in favor of keeping a human on the loop in the dialog, at least initially. And as a nice comparison he brings Westworld, where there is a narrative department driving the hosts dialogs and personality. When I was asked by Teri for his podcast Alexa in Canada, what I thought is next in voice at Voice Summit I replied that this is the platform for the creatives. Well crafted conversations won’t be drived by programmers but by writers and script writers and those connoisseurs of the human condition.

This is what’s coming in conversational AI: tools for language generation and dialog management systems, higher fidelity conversations and humans in the loop. And who knows maybe Westworld seamless conversations are not that far off.

Thank you for listening, and we’ll talk tomorrow.

What you need to know to start your week in voice

What are the 4 important news you need to know to start your week in voice?

Amazon Alexa SDK for auto

Good morning, happy Monday. I wish you all have a great and productive week ahead. Last week Amazon announced their SDK for developers to develop applications on Alexa for autos. The SDK is available for developers on github. The SDK includes support for streaming media, smart home controls, weather reports, and Alexa’s many skills. It’s the first time developers will be able to get a first look at how Amazon wants to integrate Alexa into in-vehicles. It will take some time for developers to get familiar with Alexa Auto, so we’ll likely see more cars ship with Alexa in 2019 and beyond. What we are seeing now it’s Amazon moving to every possible context where their smart assistant Alexa might possible be. And as rumour has it, they might be even trying to release an smart phone. Alexa in cars will also be competing with Andoid Auto and Apple’s CarPlay for the attention of the automotive industry. According to a study by SEO Tribunal, the car it’s the places where users use voice search the most. So it’s expected this would be a battle ground for companies to dominate.

Samsung unveils their smart speaker: Galaxy Home

Samsung did some unveiling this past week, among which is their smart speaker Galaxy Home, landing with Bixby as smart assistant. It was shown, but we don’t know when the is actually launching.  The design is somewhat different that the other smart speakers in the market, and some Twitter went full on it, some Twitter comments said it’s similar to a grill, a spaceship and a portal entry. Pretty far off comparisons. Design specificities apart, it seems to be a high end device with several speakers and bigger than the competitors. On the music streaming service,  Samsung is gonna ride Spotify from it’s phone to its refrigerators. I’m interested to see how their mobile Bixby assistant it’s received by users compared by Google.

Google released audio news, a functionality similar to Alexa’s flash briefings

Following their Lenovo launch with the first smart display for the Google Assistant, the company announced that users can now get video or audio news briefings to catch you up on headlines. For now, the update is available only in the US and not open to the general public. According to their blog post, they will be learning from U.S to expand further. I do hope they open to the general public. It’s good they came forward with a feature that’s growing in popularity every day for users worldwide. I can’t wait to delete the code we have now to make this briefing available on Google Assistant.

We are now part of VoiceFirst.fm

Last news, but not less important, on Saturday we announced that we are now part of Voicefirst.fm, along with This Week in Voice, Voice in Healthcare, Voice Marketing and other important podcasts in the voice ecosystem. We are thrilled to come onboard to the VoiceFirst.fm family and to build this partnership.

Thank you for listening, have a great day and week, and we’ll certainly talk tomorrow.

Content flow in smart speakers

Maybe you don’t need an app to be in voice platforms, you can just repurpose your content for smart assistants platforms. This is one of the things VoiceFirst Labs does for companies, repurpose their existing content for smart speaker platforms to reach millions of users. Google actions offer a way to do just that. When we were looking for the best way to put our audio content in Google Actions for VoiceFirst Weekly, I was appalled by the fact that it didn’t exist a simple way like flash briefings in Alexa to do it. This was later confirmed by several more people that have asked me how to do a flash briefing for Google home and the Assistant. For a full explanation of how we did it, we published an episode called Sneak peek into VFW tools and processes that you can check out. But fear no more, there are ways to be in actions with ease.
They are called content-based actions. For podcasts, recipes and news publishers if you a structured data markup and accelerated mobile pages in Google, they will automatically create actions for you with a corresponding auto-generated page in the assistant directory. Pretty neat.
If you follow the instructions in each of the links I’m gonna let you in this episode notes for each type of content, you’ll have your action in no time. Unfortunately, for audio content, you have to contact them, which I did and still haven’t heard back. Will let you know when I do.
In the case of Alexa, the easiest way to have content up and running are flash briefings. The advantage is that you can provide either text to be read by Alexa or an audio source. When you are providing text it’s important not to pass your blog post as is, because Alexa gets funny and confusing when reading lots of statistics or links. So that’s an important detail to have into account. And for audio in Alexa, If you already have a podcast, it’s easy enough to pass the URL where you can get the latest audio published. Ideally, flash briefings are not that long, but I listen to a lot of briefings with a variety of length. Summarizing: there are two main ways to have a content flow for smart assistants platforms: One is through text, for Google assistant users can read it as cards in their phones and for Alexa, it will read it out loud. The other is audio, whether is a podcast or audio recordings.
What are you doing today for having your content on smart platforms? Content is no longer king guys, context is, make your content available in any possible context consumer might be in.
Thank you for listening!

Relevant links for content creation:

Google Actions
Content based actions in Google Actions

Podcast action

Recipe action

News action (text based)

Amazon Alexa

Flash briefings

Google Actions built in intents or better discoverability

There are a couple of things Google is adding to their Actions that are interesting: One is the upcoming App Actions that we talk last Saturday (go and listen to that one too) that will allow you to turn your Android mobile application into an Action relatively easily and the second one is the recent update to the Actions console. To manage the complexity of the multiple ways a user can ask for your action, they introduced Built-in intents. Built-in intents are currently in developer preview. You can build and publish Actions that use this feature, but Google is still working on fine tuning discovery and ranking.
A built-in intent is a unique identifier that tells the Assistant that your Action is suitable to fulfill a specific category of user requests, such as playing games or ordering tickets.
During Action discovery, the Assistant can use information about your Action, including its attached built-in intents, to suggest it to users only if it is relevant.
To minimize conversational round-trips, the Assistant will attempt to extract parameters from user queries that map to built-in intents and pass them on to your Action. To learn more about these parameters and see examples of user queries, see the Built-in intents reference.
Requirements
For now built-in intents are available only in English-US, but

  • de-DE (coming soon)
  • fr-FR (coming soon)
  • ja-JP (coming soon)
  • ko-KR (coming soon)

Are coming soon.
Best practices for using built-in intents includes
Map built-in intents to specific actions: For example, if your Action supports the PLAY_GAME built-in intent and receives that intent, you should immediately send the user to the game feature of your Action. Avoid asking the user again if they want to play a game.
Make sure to use the built-in intent parameter values that the Assistant sends to your fulfillment. Avoid re-prompting the user for those values.

Why built-in intents matter?

It’s a step forward towards Actions discoverability, the main pain point of voice applications today. I’m curious to know how the ranking system will work to select Actions, but for now seems smart to start integrating it in your applications.

As you see, Google is still about ranking algorithms! Me

Thank you for listening, we’ll talk tomorrow!