Archive Monthly Archives: August 2018

VoiceFirst infinity stones

I saw this image of a voice gauntlet by Mark Tucker on Twitter. Are you fans of Marvel? I wouldn’t say I’m a fan of anything cause I don’t lose it over teams or companies, but if I have to be a fan of something it has to be Pixar and or Marvel. And for those who have seen their latest biggest success movie, Infinity war, where all the cinematic universe click together like a big puzzle, then you heard about the gauntlet and the infinity stones. In this episode we take you to the wonderful world of the VoiceFirst infinity stones.
Let me remind you first that there are six stones in the Marvel universe:

Soul, reality, space, time, power and mind.

The VoiceFirst infinity stones

Now for VoiceFirst, that will translate as: Personalization is the soul stone, monetization is reality stone, discoverability is the space stone, convenience is the power stone, retention is the time stone and context and memory is the mind stone.
With monetization, discoverability, personalization, convenience, retention and keeping context and memory, almost all of the voice puzzle come together as well.

There is another stone

But I think a piece is missing: the tongue stone: internationalization. Voice is the technology that can understand people instead of people understanding and adapting to the technology. And for reaching more users, to move beyond the English speaking countries, internationalization has to become a upfront strategy for voice.
Each stone has different challenges and a road to travel before we can say it’s solved, but they are the pieces that needs to come together for great voice experiences. I hope it does not takes us 18 movies to figure it out!
Go to to see the image and the tweet.
Send me fandom of marvel comparisons with voice tech if you have any, or send me any Marvel fandom stuff, I’ll promise to watch and read.
Remember to subscribe, like, comment and share this episodes. My name is Mari, and you can find me on Twitter as voicefirstlabs and on Instagram @voicefirstweekly. Thank you for listening you have a great day and we’ll talk tomorrow!

It’s Thursday newsletter day, here’s a preview

It’s Thursday, newsletter day. Every Thursday, at 9:50 AM Pacific we deliver our weekly installment on voice technology. Subscribe to get content we don’t talk here, because it’s nature is longer or more thoughtful for the format of this episodes that are short. Here is wavenet E for you with a preview:

Hi there, this is Google Voice Wavenet E, weird name, but the host of this podcast can not talk about weird names, I can’t even pronounce hers. Here is a summary of branding in voice covered in today\’s newsletter:

More brands are recognizing the influence of voice. Today cases: Petco voice experiences, Why voice integration for brands is more important than voice search

Voice commerce considered next retail disruptor.

Yes, I do consider the future of media to be synthetic media. So we’ll talk soon.


Google GA of cloud text-to-speech wavenet voices, with a especial guest

Google announced the general availability of Cloud text to speech, new audio profiles that optimize sound for playback on different devices, enhancement to multichannel recognition and more.
Actually, to show you how meta we run at VoiceFirst Weekly, starting yesterday Cloud text to speech will offer multilingual access to voices generated using WaveNet, a machine learning technique developed by Alphabet subsidiary DeepMind. And this bit is generated from a wavenet voice for English US. WOW
Clouds to speech offers 17 new wavenet voices, in the episode notes you can find the link to all the voices available and languages. Wait, there is more. The company also announced Audio profiles, which were previously available in beta.
Audio profiles let you optimize the speech produced by Cloud Text-to-Speech’s APIs for playback on different types of hardware. You can create a profile for wearable devices with smaller speakers, for example, or one specifically for cars speakers and headphones. It’s particularly handy for devices that don’t support specific frequencies; Cloud Text-to-Speech can automatically shift out-of-range audio to within hearing range, enhancing its clarity.
The other features that are part of the announcement are multichannel recognition, language auto-detect, and word-level confidence.
Multi-channel recognition offers an easy way to transcribe multiple channels of audio by automatically denoting the separate channels for each word.
Language auto-detect, which lets you send up to four language codes at once in queries to Cloud Speech-to-Text. The API will automatically determine which language was spoken and return a transcript, much like how the Google Assistant detects languages and responds in kind. (Users also get the choice of selecting a language manually.)
Word-level confidence, which offers developers fine-grained control over Google’s speech recognition engine. If you so choose, you can tie confidence scores to triggers within apps — like a prompt that encourages the user to repeat themselves if they mumble or speak too softly, for instance.
In the monthly free tier users will have up to 4 million characters for the standard voices and up to 1 million characters for the wavenet voices.
End of synthesize voice.
Doesn’t that sound amazingly real. This is Mari, your usual host. I wanted to give you a real scoop of the Google wavenet voices. I tried several Wavenet voices and as the alphabet was going up, I felt the voices were improving and felt more natural. This one is wavenet D. Synthetized voices that feel natural like this one will have a huge impact.
Maybe I’ll let wavenet-D to host more episodes! Did it sound natural to you? Let me know what you think.
Remember to like comment and subscribe and tell me what you will like to hear me talk about in future episodes. Our Twitter handle is voicefirstlabs and we are voicefirstweekly in Instagram, shut us a comment there to let me know what you think or anything else really, I love to interact about this space. My name is Mari, I’m will talk to you tomorrow.

Here are the resources for the new wavenet voices:

Supported voices

How to create audio

Here is the pricing

And this are the other voices:


The next wave of chatbots

Hello, there! Welcome to VoiceFirst Weekly.

Three or four years ago chatbots were set to be the Next Big Thing. It hasn’t been so, and according to a recent report by VentureBeat, tendency is moving towards the side of rule based chatbots, those that are build on a set of predefined rules, also referred as dumb as opposed of those that depend on machine learning. The VentureBeat article highlighted how chatbots for a lot of companies have failed to provide the returns expected. Other evidence is how chatbot platform providers like Amazon are basing their chatbot platform in this rule-based model that’s easier to implement and easier to use.
Does that means that chatbots are dead? Certainly not. Let’s say that was chatbot 1.0 wave. And to all of the disappointment chatbot progress so far might have bring: As Bill Gates puts it, We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.

The current state of chatbots have proved they are unable to handle specialized queries requiring knowledge outside the functional domain. The next wave of chatbots is going to be enhancing the capabilities to create a completely custom, differentiated experience by combining knowledge across relevant segments and provide better insights. According to PWC, this will give rise to a new level of conversational [banking] where results are delivered instantly through real- time conversation.
What will be the 3 main points driving the next wave for chatbots?
Number 1. Drive customer loyalty and brand awareness. When designed right, chatbots can add emotional power to the interaction with the user, enhancing captivation and loyalty. As sentiment analysis keeps improving, the next chatbots will be able to leverage the sentiment of the user to provide the best solutions for the context and the sentiment the user is in when interacting.
Number 2. Create a cognitive institution
A chatbot can be designed to respond to all kinds of requests and queries. It can become the insights database driving decision making with data based analytics for your company.
And number 3. Integration with present and future technologies
Having reached certain level of maturity, chatbots can now look at more integrations to leverage innovative technologies. Leading to a new set of use cases for chatbots that are not being considered today. ManyChat CEO has predicted that chatbots will transition from the early adopter stage into the beginning of early majority in 2018 and more than 1 million bots will be created on Facebook Messenger.
To the future of jobs to be done and a master assistant we seem to be moving, bots will not just interact with customers and human agents but, increasingly, with other bots in order to get tasks done.
Chatbots are leaving their infancy to enter a more mature stage in the next 5 years.
The links for the resources mentioned in this episode will be available at
Thank you for listening. Remember to like comment and subscribe and tell me what you will like to hear me talk about in future episodes. Our Twitter handle is voicefirstlabs and we are voicefirstweekly in Instagram, shut us a comment there to let me know what you think or anything else really, I love to interact about this space. My name is Mari, I’m will talk to you tomorrow.

What you need to know to start your week in voice

Hello, happy Monday! Welcome to VoiceFirst Weekly! What you need to know to start your week in voice: First

Shortwave, Google new experiment on audio

According to a release by The Verge, Google is working in an experimental podcast app called Shortwave. The app was discovered by the trademark filing which describes it as “allow[ing] users to search, access, and play digital audio files, and to share links to audio files.”

A Google representative said the focus of the app was on spoken word content and that the project being developed in the Area 120 incubator will help users discover and consume spoken-word audio in new ways. It’s an early experiment and they didn’t give more details.

This comes after Google released the Podcast app, which we made an episode on probably a month ago. It’s unknown what will be the difference between Shortwave and Google Podcasts, but it’s clear that the forefront of the company is AI, but they are betting hard on audio and a voice-first future.

GOV.UK gets a voice

As a government we need to approach voice services in a consistent way. That’s a quote from an article released in gov.UK blog. GOV.UK is incorporating voice assistants to their digital strategy. Smart speaker ownership in the UK is up 8% of adults, 3 points ahead in 2018 alone. For the team behind the work, conversations with Amazon and Google made it clear that many users are asking questions where government is the best source.

For GOV.UK, working on voice is an opportunity to meet the rising expectations of users and make government more accessible.

GOV.UK is designing for scale, for anwer government related questions, for consistency and for multi-platform. The site is aware of the current challenges, like privacy and identification that many government services requires. But they are also very aware of the advantages and playing wait and see for the present challenges, providing users on voice platforms with what they can offer today.

This is not the first time government service gain a voice, the city of Ozark, Missouri developed an Alexa skill by the guys at VoiceXP. But I do think is the first time at a national level. There is so much potential in government services for conversational interfaces that I’m sure, we’ll see more use cases like GOV.UK emerging.

Thank you for listening, you have productive week, an awesome day. Remember to like comment and subscribe and tell me what you will like to hear me talk about in future episodes. Our Twitter handle is voicefirstlabs and we are voicefirstweekly in Instagram, shut us a comment there to let me know what you think or anything else really, I love to interact about this space. My name is Mari, I’m will talk to you tomorrow.

Extended Alexa fellowship

As part of their believe that voice is the future, Amazon is expanding their Alexa Fellowship with a Fund for Investing in Student Scientists and Entrepreneurs.
Amazon is not only betting on developers, but also in the academic community, supporting researchers at top universities focused on speech and language technologies. The initiative is divided in the Graduate Fellowship and the Innovation Fellowship. Universities part of this program include Carnegie Mellon University, Pittsburgh, PA, International Institute of Information Technology, Hyderabad, India and Johns Hopkins University.
I was talking a few days in the episode about voice fragmentation, how Amazon is the company investing more resources in developers, and they are extending that work to researchers too. It’s a thing to watch.
The link to the announcement is in this episode notes.
Our Twitter handle is voicefirstlabs and we are voicefirstweekly in Instagram, shut us a comment there to let me know what you think or anything else really, I love to interact about this space. My name is Mari, I’m will talk to you tomorrow.

Branding in voice is brand’s biggest challenge for the next years

Branding in voice needs to be respectful of the user. The key will be to bring branding in a way that is seamless to the user experience. You need to be careful about when to do branding in voice and know where to cut off. You don’t want to interrupt in the middle of a sentence.

The context the user is in it’s relevant as well, what works in one platform might not work in the other. If you are promoting in Google Assistant vs your own app, the considerations are different since those are not technically your users, in the sense that you don’t know how they landed in your action.

Considering a voice strategy and branding in voice will become more and more relevant as consumers get used to talking to their devices and expect to have conversations with almost every electronic thing. Voice will be the biggest challenge for brands since they have to go on the internet. As the number of users using smart speakers from their phones grows, the first thing you can do is integrate voice into your mobile app or chatbot in your website. Apple will be releasing Shortcuts in September, a way to integrate with Siri and Google is said to launch App Actions, a way to customize the current Android applications to be available in the actions store. The other strategy you can deploy is creating content for the home environment. The smart assistants preferred place today is our homes. Meet your users where they are integrating content related to the activities that are typically done at the house. And above everything, please don’t underestimate the impact voice will have in branding.

Our Twitter handle is voicefirstlabs and we are voicefirstweekly in Instagram, shout us a comment there to let me know what you think or anything else really, I love to interact about this space. My name is Mari, I’m will talk to you tomorrow.

VentureBeat wrote a piece on why branding is hard in voice, check it out. This is another good piece on how brands can benefit from voice technologies.

And this is our briefing on how App Actions work.

Thank you for listening, you have a great day and we’ll talk tomorrow!

5 Insights of voice technologies

Scott Huffman, Google Assistant VP of Engineering, wrote an article outlining the 5 insights of voice technology they have gathered since launching Google Assistant 2 years ago.

Here they are, for you today, commented by me.

Number 1. Voice is about action. Assistant queries are 40 times more likely to be action-oriented than Search, with people asking for things like “send a text message,” “turn off the lights,” or “turn on airplane mode.”

As Ursule LeGuin put its beautifully: When you speak a word to a listener, the speaking is an act.

And this is what actually makes this technology unique: it allows to complete a task in a natural way.

Number 2 People expect conversations. For voice assistants, according to Huffman, people start querying in a command like way, but expectations go up pretty quickly, they expect conversations. On average, Assistant queries are 200 times more conversational than Search. Simple commands can take thousands forms. For example, people ask the Google Assistant to set an alarm in more than 5,000 different ways, which means that they have to build the Assistant to understand this conversational complexity.  

Number 3 Screen changes everything. Nearly half of interactions with Assistant today include both voice and touch input. Screens bring a completely new canvas for conversational AI, where we can bring together voice and touch in an intelligent way.

Number 4 Daily routines matter. Where and how people use the Assistant varies throughout the day, but the consistency of the experience should stay the same. As I have said before, in voice applications not content, but context is king. As you look at the patterns of the daily routines of users in Google Assistant, their activities varies depending on the time of the day.

Number 5 Voice is universal. Because the entry point for voice is so low, it can be used by people across devices, languages and geographies. According to Huffman they have seen that Google Assistant users defy the early adopter stereotype: there’s a huge uptick in seniors and families, and women are the fastest growing user segment for the Assistant.


Voice space is new and we are still learning what works. However, the clear case for jobs done, for everyone of the technology is encouraging.

Our Twitter handle is voicefirstlabs and we are voicefirstweekly in Instagram, shut us a comment there to let me know what you think or anything else really, I love to interact about this space. My name is Mari, I’m will talk to you tomorrow.

This week in smart devices

Two recent relevant news in smart devices.
Saint Louis University is placing 2300 Echo dots in student living spaces.
Saint Louis University has announced that it will be placing Amazon Echo Dot devices, powered by Alexa for Business, in every student residence hall room or student apartment on campus. While other colleges, like Arizona State University, have put Echo Dots in student housing before, SLU says this is the first time a college will equip every student living space with an Amazon Alexa-enabled device.
The smart speaker race is heating up
The Statista website published an infographic with an analysis of the second quarter of 2018 smart speaker market.
The smart speaker market continues to grow, with global shipments tripling from 3.9 million units in the second quarter of last year to 11.7 million units in this year’s June quarter. Competitors slowly are eating away at Amazon’s once dominant lead. Google has gained some ground growing its market share from 16.1 percent in the second quarter of 2017 to 27.6 percent in Q2 2018. Chinese’s Alibaba is commanding Asia’s smart speaker market with a global 7 % in shipments.
And Baidu DuerOS is available in 100 million devices.
Will Facebook release another device before the year ends? The race is hot!

Our Twitter handle is voicefirstlabs and we are voicefirstweekly in Instagram, shut us a comment there to let me know what you think or anything else really, I love to interact about this space. My name is Mari, I’m will talk to you tomorrow.

Voice space fragmentation

The rumours/news of Facebook Messenger’s and Instagram voice input features plus Samsung Bixby announcement got me thinking about how quickly the voice space is diversifying. Currently at our company we are working with applications for Alexa, Google Home, a chatbot and we started experimenting with Bixby after being invited to develop capsules. And the only reason we are not trying Baidu DuerOS is because the documentation is in Chinese and my French doesn’t get that far away. What I’m saying is we had for the time that Windows phone 3 smartphones but mainly 2 smartphones platforms to care about, for voice technologies, the number of devices and smart assistants is only growing.

Who has the developers?

So far, Amazon Alexa commands the biggest number of developers and voice application creators. I don’t see that changing for a while even if the shipment numbers move around over the end of the year sales. The work Amazon’s evangelists team is doing plays an important role in this, no other company is investing as many resources in listening, educating and providing resources for skills developers. At the same time, the more creators of applications, more skills and more user engagement.
When Bixby announced their smart assistant, the first thing that came to my mind and I commented is I just want to see how much developer attention they can get to develop their first capsules (Samsung calls their assistant applications capsules). If you think about it Cortana partnering with Alexa, is genius in that way, they will be leveraging the environment that Alexa developers already have. I’m not saying the number of smart assistants and devices to build for will be a major problem, because we have tools like, dialog flow, voice app and frameworks like Jovo, that allow you to develop for multiple platforms, and most likely they will continue to add support for new platforms as they come along. However, I do not see a future is us building skill versions for 5 voice assistants in 5 different English locales, let alone any other language you might want to support.

The voice platform space is different is fundamental ways

Thinking about the huge impact the iPhone had, they also had a huge advantage with the App Store when it launched, it was a breakthrough, no other platform like it existed. Voice technology adoption is going to be fundamentally different than what we have seen until now with websites and mobile apps, because the circumstances are different in terms of building, promoting and usage. I’m betting whoever picks the most apps developers is going to be the platform that rule all the others.

Voice is the platform for creatives, but not yet

I have said before this is the platform for creatives, but creatives are not the first to come to build, I had a call last week with a musician interested in voice tech and he was asking for advice on how to make his friends to understand that voice is a platform they need to be in. The early adopters are the developers, not the creatives. Essentially, developers focus and attention in one platform or the other is the ultimate trade for the future of voice applications in a fragmented space.

Let’s keep the conversation on Twitter and Instagram

Our Twitter handle is voicefirstlabs and we are voicefirstweekly in Instagram, shut us a comment there to let me know what you think or anything else really, I love to interact about this space. My name is Mari, I’m will talk to you tomorrow.

1 2 3 4