Episode Archives

Advancements in speaking styles in TTS

Yesterday via the Amazon blog we learned that the company has been working on a neural TTS that can model speaking styles with only a few hours of recordings. Among the problems in speech synthesis is the lack of tone and emotion. Finding correct intonation, stress, and duration from written text is probably the most challenging problem for years to come (per research in the Helsinki University of Technology).

The way you customize a synthesized speech today is through SSML, a speech markup language. SSML allows configuring prosody, emphasis and the actual voice used. The problem is that people change their speaking style depending on context and the emotional state of the person. What Amazon is saying in this announcement is that their latest TTS can learn a newscaster style from just a few hours of training, which it’s significant because with their previous model tens of hours were required.

This advance paves the way for Alexa and other services to adopt different speaking styles in different contexts, improving customer experiences.

The same way the neural model might work for newscaster style might work for other styles. Amazon also said they created a news-domain voice.

Listeners rated neutral NTTS more highly than concatenative synthesis, reducing the score discrepancy between human and synthetic speech by 46%.


Let’s listen to the female voice with the newscaster style and judge for yourselves.

Very realistic news style. Isn’t?

It is very timely to bring up the results of the Reuters and Oxford report:

The Future of Voice and the implications for news. (I expanded on this on our last newsletter, subscribe). According to the report, consumers love smart speakers, they don’t love news from smart speakers. One of the main reasons the report concluded is Synthesized voices are hard to listen for many users.

The report also concluded that news updates are among the least valued features in smart speakers.

Additional points:

This new development of neural TTS by Amazon could mean more options of customization for brands looking to get a unique persona and voice in smart devices. Definitely, this is a very well received improvement in TTS.

I get more and more interested in synthesized speech every day as I realized is going to be a fundamental part of the future. That future might not be that far off: last week Chinese News Agency Xinhua announced the “world first” TV anchor at the World Internet Conference. The anchor features a virtual AI controlled avatar powered by synthetic speech.

The Revolution will be synthesized, my friends.

Thank you for listening, you have a great day. As always you can find me on Twitter as @voicefirstlabs or Instagram as @voicefirstweekly.

A court with a voice: Montgomery Court conversational services

Montgomery court, in Dayton city, Ohio has added a new technology to boost their services.

The innovation, an automated virtual assistant and chatbot, will answer many of the most frequently asked questions received by the Montgomery County Clerk of Courts Office by email and phone.

The bot is named after the Greek goddess of wisdom, law and justice Athena, is designed to answer questions directed to the Montgomery County Municipal Courts and Montgomery County Common Pleas Court, as well as the county’s auto title branches. Athena, accessed at www.mcclerkofcourts.org, can also look up basic ticket and case information.

The bot was designed in-house using Microsoft Cognitive Services. Part of the significance of the innovation is that it can answer questions from their five divisions, so you can ask about your car, or passport or how to pay your traffic ticket.

The virtual assistant is also connected to the county clerk’s online public records information system. Athena can use that connection to tell users the status of a case and link them to related documents.

We have talked before how voice and conversational can be life-changing for government and city services. It comes to mind the work of the city of Ozark, Missouri that VoiceXP did. Instead of browsing all the services websites to find the information you need, or wait in a call for the status of your application, you can just chat with a bot to ask a question: how is my case for this going? Or what are the requirements for DMV in California? It’s pretty life changing. We humans will always appreciate technology that saves time or provide convenience.

Microsoft keeps pushing its cognitive services as a strong developer option for use cases like this, at the same time pushing their partnership with Amazon and Alexa. It seems to me that they are trying to get enterprises to use their cognitive services way more than they are pushing Cortana. That’s definitely part to the new horizon of Microsoft as an enterprise services company and away from a consumer company.

Thank you for listening. My name is Mari, this is VoiceFirst Weekly flash briefing show. Before I wrap up this episode: a reminder of the coming Alexa Conference in Chattanooga, Tennessee January 15-17. You can sign up at the voicefirst dot fm website. I’ll be there, as well as my cofounder Nersa. I’ll be talking in the track of podcasting on the age of Alexa and we’ll also have a booth where you can check us out. Don’t miss it!

It’s time small businesses had big business capability. The AI for small business

The voice community is amazing to watch as it evolves. I have been very fortunate to meet lots of people who are making an impact on voice technology and conversational interfaces. This episode is special because it features one of those startups impacting voice tech every day, and it is the first time VoiceFirst Weekly show welcomes a guest.  We are very thrilled about how it turned out.

In this episode, I talk to Brendan Roberts, CEO of Aider, The AI assistant for small businesses. Aider is launching now in Australia and New Zealand, with plans for the US in 2019. Aider will help you answer questions like: What’s my top selling product? What’s my revenue today? Who is meant to be working tomorrow and what’s the weather going to be like? All this from your phone bringing your business context into account.

I met Brendan at the Voice Summit back in July after arranging a meet up of folks at the conference from the Voice Entrepreneur Community on Facebook. I got to see Aider first hand and was really impressed by its capabilities. Aider integrates with several SaaS apps that small business users might be familiar with for sales, accounting and client management providing insights and learning from the user’s actions. What I thought was really impressive for an app this type was the ability to keep the conversation across conversational channels through voice or messaging.

Without further ado, please enjoy my talk with Brendan.

You can contact Brendan on Twitter  or LinkedIn . You can also try Aider and sign up for Aider beta access.

Voice activated smart glasses available exclusively in stores

A showcase for smart glasses opened yesterday, marking November 12 as the day (per the company) that the first world’s smart glasses store opened. North is the Canadian company who develops futuristic HCI products. The company has raised over 135 million from investors, including the Amazon Alexa fund.

Focals is the smart glasses the company is presenting, exclusively available in their stores in Brooklyn, New York and Toronto since yesterday. The custom made glasses features a display that only you can see, I’m not exactly clear about the technology behind this as it’s not expanded in their webpage, I’m very curious and excited to know more about it.

Focals includes visual summaries, smart text and emojis and voice to text. It also comes with a navigation feature with search, turn by turn and the ability to hail an Uber.

The display is controlled with a Loop, a small finger ring with a tiny, four-way joystick that’s included in the purchase, along with a case that doubles as a battery charger. The glasses sync with the user’s Android or Apple iOS device via Bluetooth.

Alexa anywhere

Focals comes with Alexa built-in. According to the showcase page you can Ask Alexa to play music, hear the news, see the weather, control your smart home and more. I’m guessing you can do anything that Alexa allows you to do.

The glasses also comes with a function to pause it all from when you don’t need them:

Technology that’s there when you need it, gone when you don’t – hidden by design.

Form plus function

The glasses comes in stylish designs, a la Warby Parker, maintaining the idea of keeping the technology invisible for only when you need it. The store is also selling the experience in the shopping process. You have to be custom fitted for Focals.It’s crucial to understand how the technology looks and feels,Adam Ketcheson, Chief Marketing Officer of North said to The Bridge: It’s incredibly important for people to get a hands-on experience, especially at our price point. The entire retail model is so people can immersively understand what it is and get the right fit.

Focals will be offered in a variety of styles at $999.

Smart glasses have been emerging and dying for a while now. Google Glass and Intel’s Vaunt both shut down in 2015 and 2018 respectively.

What makes Focals different? The focus on design and style more than the geek outlook of Google Glasses might be a compelling point. Focals are voice activated, but their first selling point is for the technology to be there only when needed, otherwise looking as regular glasses. They are not advertised as a technology, geeky gadget, more as helping companion.

As it often turns out in technology advancements, timing might turn different for North glasses.

Waiting next time I go to NYC to visit the store and try the Focals. Let me know what you think on Twitter @voicefirstlabs or Instagram at voicefirstweekly. I’m Mari, this is VoiceFirst Weekly flash briefing, have a great day and I’ll talk to you all tomorrow. We have an special episode tomorrow with the first human guest in the show. Don’t miss it. See ya.

How receptive are smart speaker owners to advertising?

Survata’s September survey of 2,000 smart speaker owners in the US came with one surprising finding: Apple HomePod owners are more likely to be receptive to audio ads than anyone else.

According to the Survata data, as reported by BusinessInsider:

  • 35% of HomePod owners would be interested in hearing about sponsored products or services on their speaker
  • Only 22% of Google Home owners said the same.
  • And just 17% of Amazon Echo and Echo Dot owners are receptive to ads on their speaker.

This shows there is still a large chunk of people who don’t want to hear ads on their smart speakers, suggesting it’ll be an unpopular move if anyone introduces sponsored content any time soon. It’s also unlikely Apple would venture into the sponsored content territory, given it has shied away from targeting ads at users.

Survata market research president Dyna Boen explained that anomaly:

While adoption of Apple HomePod has thus far lagged behind Amazon Echo and Google Home, and thus makes up a smaller percentage of the sample, we still are seeing that these users are saying sponsored content ‘very positively impacts their smart speaker experience’ at a statistically significant level.