It wouldn’t be fair to start this summary with anything else than appreciation for the work Bradley Metrock and his team pulled out organizing the conference. As I said the first day at the conference, it’s very strange to not stumble into a technical or logistical problem while setting up a booth at a conference. And yet, that was exactly the experience we had Tuesday, January 15, when we arrived at the booth. From there, everything continued to run smoothly.
Nersa, co-founder of VoiceFirst Labs and I were mainly in the Alexa Fair area, talking with people on the floor about the work we do with VoiceFirst Weekly (newsletter and podcast) and VoiceFirst Labs. So my view of the conference is mainly from those lenses. I compiled other peoples takeaways that included sessions. Here’s a list:
The main topics:
Voice is maturing, Brett Kinsella one of the keynote speakers of the conference and Editor of Voicebot.ai name it as a second phase of voice in terms of adoption, competition in the space and the integration of voice tech into existing products, this point of a second phase of voice has also been spoken by Brian Roemmele.
Voice in the car: There has been a lot of enthusiasm around the connected car in the recent months as it seems is the next battleground for voice technology companies.
Nearly twice as many U.S. adults have used voice assistants in the car (114 million) as through a smart speaker (57.8 million). SoundHound
Healthcare generated attention: Lots of interest from attendees, voice is making its way to healthcare and incumbents like Mayo Clinic and Orbita are leading the way.
Sonic branding and personality in voice are gaining momentum: Several of the sessions and talks were dedicated to this topic, a hot one in an industry growing as much as voice. Differentiation through personality signals a way for brands to stand out in voice technology platforms.
Storytelling is getting a space: Tellables and X2Games were some of the companies that are presenting an interactive storytelling with smart assistants. What they are doing is at least intriguing and worth checking out. Founder of Atari presented their latest game: St Noire, an intersection between stories, interaction and gaming. Nolan Bushnell also pointed out that half a billion dollars are going to be dedicated to interactive entertainment 🤯.
Podcasts and publications like Voicebot.ai, Bettlement Movement Marketing, Alexa in Canada, VUX World and may I dare to say VoiceFirst Weekly do provide value for users entering the space. It is the impression I took partaking in several conversations and observing visitors at the floor.
I was part of the Storytelling track in the panel Podcasting in the age of Alexa. Audioburst, Gimlet Media and Amplifi were the other companies debating in the panel, which turned out to be very insightful, given we were all from very different backgrounds. The biggest problem for podcasters in the age of smart assistants is the same where smart assistants were not in the picture: distribution. According to Steve Goldstein from Amplifi Media, there are 600 000 podcasts of which 70% are dead. Which means podcasting is easy to start, difficult to maintain and even harder to monetize. However, as it turns out, both smart assistants and podcasts have signaled 2019 to be “their” year as the public and brands are more aware of them and the growth is outstanding. From my point of view, short episodes are going to be paramount soon, as well as storytelling podcasts. In that regard, Gimlet Media is leading the way with their productions that awarded them an Alexa Conference prize.
The Alexa Conference social media channels broadcasted several live streaming interviews with presenters at the Alexa Fair and other voice thought leaders . You can catch each of them in VoiceFirst.fm Youtube channel.
The stage where I spent most of the time was the Alexa Fair Exhibition. VoiceFirst Weekly had a booth in the podcast section. What I highlighted in our Final Thoughts of the Alexa Conference with Ian Utile was how intimate the fair felt. Visitors had the opportunity to talk for half an hour with the presenters. This doesn’t happen in bigger conferences where you might have time to only exchange cards.
What else the fair taught me? What companies are doing in the voice space. It was a timely reminder that despite everything I mentioned above about the growth of voice technology, it’s still very early in adoption and business opportunities. My forecast is that is going to be a lot of pivot in the years to come, and not all players will make it to home base. That’s fine and expected. Most conversations included the question: what do you think about monetization, or do you know how people are monetizing or who is monetizing? Along with “we are still figuring it out”. Aren’t we all.
The best of it all though was the opportunity to interview almost all companies presenters, including executives answering two questions: what do they do, and why did they think of the Alexa Conference as a space to showcase their products. This interviews will be available at the end of next week when we are relaunching VoiceFirst Weekly podcast, after a process of rebranding. Stay tuned for those, it is full of insights from the leaders of the companies that are driving voice technology.
Apple’s HomePod will be available for sale in China early next year. A listing is already up on Apple’s China site, with the smart speaker priced at RMB 2,799 (about $407), or about 17% more than its $349 price in the United States. Though Apple doesn’t list an exact shipping date for Chinese buyers, it says the HomePod will be available in early 2019.
Apple has balls, let me tell you that. It’s going to launch a smart speaker in the hottest battleground with a (much) higher price competing against Tencent, Baidu, Xiaomi and Alibaba’s smart speakers. All of them combined has hundreds of millions of devices installed. And let’s not forget, way less pricey options, as low as $15 in sales. Yeap, and options, which in smart speakers, Apple does not have a lot to show. Is the company hoping that customer loyalty to iPhones is going to be translated to the HomePod?
TechCrunch article illustrates this last point:
Despite formidable competition from Samsung, Huawei, and Xiaomi, Apple held an 11.9% market share in China as of the second quarter of 2018, according to Gartner (referring to the iPhone).
With the announcement of Apple Music in Alexa, we know Apple is focused on its services or at minimum Apple Music. What I will like to see more, it’s their efforts in a more service-oriented Siri. It’s where every smart assistant platform is moving. Siri’s co-founder said in a CNBC interview that Apple dropped the ball in third-party voice partners, but they are the future. I completely agree.
Google announced yesterday in their Canadian blog the introduction of new Google Assistant experiences for families. The announcement clearly focuses Google Assistant in a light of family-related activities, especially with kids.
Today, we’re introducing new experiences, designed specifically for families with kids so they can learn, play and imagine together.
Among the new experiences, Google says families “will be able to join The Wiggles,” from the Treehouse TV show and go on a “custom Wiggles experience” commissioned by Google exclusively for Assistant.
Google is also introducing news stories appealing to kids through CBC Kids News. It can be invoked by “Hey Google, Play CBC Kids News“. The feature is focused on daily, local, national and international stories that are relevant to Canadian kids with a focus on media literacy. Google Assistant recently launched storytime experiences in partnership with Disney. Along with these new services for Canadian kids, the search giant is shifting a lot of its focus to kids and family environment. Moreover, Reuters also released a report in which one of the main conclusions is that users don’t love news in smart speakers. By shifting to kids consuming news in Google Assistant, the company is setting news consumption in their smart speaker for the future. Smart.
Another feature the blog post announced is Boukili Audio, an interactive activity that tests listening and comprehension of stories on animals, nutrition, music, travels and a ton of other captivating subjects, all in French. After listening to stories, Boukili Audio puts your child’s skills to the test with a series of multiple choice questions to evaluate their French language comprehension and their memory, all while having fun. Boukili Audio has over 120 interactive books, over 70 of which are exclusive to the Assistant. Available in French only. To Try it out Canadian users can say: “Ok Google, Parler avec Boukili Audio”.
Just in time for Christmas, Google is announcing calls to Santa: “Hey Google, call Santa”. It’s only available in English. Can’t wait for this to be available in the US to see what Santa has to say.
With their parent’s permission, children under 13 can also have their own personalized Google Assistant experience when they log in with their own account, powered by Family Link (Not available in Quebec). Family Link helps parents manage their child’s Google Account while they explore. And with Voice Match, your family can train the Assistant to recognize up to six voices.
The announcement was in the Google Canadian blog yesterday Wednesday, November 28. It doesn’t specify when the family experiences are going to be available in other territories. Google focus on families and kids in the holidays is very smart. It proves the company is looking ahead and it’s not as concerned to “fix” current users’ view. After all, Generation V is the real protagonists in voice technology.
Perfect for the Holidays 🎄
Yesterday via the Amazon blog we learned that the company has been working on a neural TTS that can model speaking styles with only a few hours of recordings. Among the problems in speech synthesis is the lack of tone and emotion. Finding correct intonation, stress, and duration from written text is probably the most challenging problem for years to come (per research in the Helsinki University of Technology).
The way you customize a synthesized speech today is through SSML, a speech markup language. SSML allows configuring prosody, emphasis and the actual voice used. The problem is that people change their speaking style depending on context and the emotional state of the person. What Amazon is saying in this announcement is that their latest TTS can learn a newscaster style from just a few hours of training, which it’s significant because with their previous model tens of hours were required.
This advance paves the way for Alexa and other services to adopt different speaking styles in different contexts, improving customer experiences.
The same way the neural model might work for newscaster style might work for other styles. Amazon also said they created a news-domain voice.
Listeners rated neutral NTTS more highly than concatenative synthesis, reducing the score discrepancy between human and synthetic speech by 46%.
Let’s listen to the female voice with the newscaster style and judge for yourselves.
Very realistic news style. Isn’t?
It is very timely to bring up the results of the Reuters and Oxford report:
The Future of Voice and the implications for news. (I expanded on this on our last newsletter, subscribe). According to the report, consumers love smart speakers, they don’t love news from smart speakers. One of the main reasons the report concluded is Synthesized voices are hard to listen for many users.
The report also concluded that news updates are among the least valued features in smart speakers.
This new development of neural TTS by Amazon could mean more options of customization for brands looking to get a unique persona and voice in smart devices. Definitely, this is a very well received improvement in TTS.
I get more and more interested in synthesized speech every day as I realized is going to be a fundamental part of the future. That future might not be that far off: last week Chinese News Agency Xinhua announced the “world first” TV anchor at the World Internet Conference. The anchor features a virtual AI controlled avatar powered by synthetic speech.
The Revolution will be synthesized, my friends.
Thank you for listening, you have a great day. As always you can find me on Twitter as @voicefirstlabs or Instagram as @voicefirstweekly.
Montgomery court, in Dayton city, Ohio has added a new technology to boost their services.
The innovation, an automated virtual assistant and chatbot, will answer many of the most frequently asked questions received by the Montgomery County Clerk of Courts Office by email and phone.
The bot is named after the Greek goddess of wisdom, law and justice Athena, is designed to answer questions directed to the Montgomery County Municipal Courts and Montgomery County Common Pleas Court, as well as the county’s auto title branches. Athena, accessed at www.mcclerkofcourts.org, can also look up basic ticket and case information.
The bot was designed in-house using Microsoft Cognitive Services. Part of the significance of the innovation is that it can answer questions from their five divisions, so you can ask about your car, or passport or how to pay your traffic ticket.
The virtual assistant is also connected to the county clerk’s online public records information system. Athena can use that connection to tell users the status of a case and link them to related documents.
We have talked before how voice and conversational can be life-changing for government and city services. It comes to mind the work of the city of Ozark, Missouri that VoiceXP did. Instead of browsing all the services websites to find the information you need, or wait in a call for the status of your application, you can just chat with a bot to ask a question: how is my case for this going? Or what are the requirements for DMV in California? It’s pretty life changing. We humans will always appreciate technology that saves time or provide convenience.
Microsoft keeps pushing its cognitive services as a strong developer option for use cases like this, at the same time pushing their partnership with Amazon and Alexa. It seems to me that they are trying to get enterprises to use their cognitive services way more than they are pushing Cortana. That’s definitely part to the new horizon of Microsoft as an enterprise services company and away from a consumer company.
Thank you for listening. My name is Mari, this is VoiceFirst Weekly flash briefing show. Before I wrap up this episode: a reminder of the coming Alexa Conference in Chattanooga, Tennessee January 15-17. You can sign up at the voicefirst dot fm website. I’ll be there, as well as my cofounder Nersa. I’ll be talking in the track of podcasting on the age of Alexa and we’ll also have a booth where you can check us out. Don’t miss it!
The voice community is amazing to watch as it evolves. I have been very fortunate to meet lots of people who are making an impact on voice technology and conversational interfaces. This episode is special because it features one of those startups impacting voice tech every day, and it is the first time VoiceFirst Weekly show welcomes a guest. We are very thrilled about how it turned out.
In this episode, I talk to Brendan Roberts, CEO of Aider, The AI assistant for small businesses. Aider is launching now in Australia and New Zealand, with plans for the US in 2019. Aider will help you answer questions like: What’s my top selling product? What’s my revenue today? Who is meant to be working tomorrow and what’s the weather going to be like? All this from your phone bringing your business context into account.
I met Brendan at the Voice Summit back in July after arranging a meet up of folks at the conference from the Voice Entrepreneur Community on Facebook. I got to see Aider first hand and was really impressed by its capabilities. Aider integrates with several SaaS apps that small business users might be familiar with for sales, accounting and client management providing insights and learning from the user’s actions. What I thought was really impressive for an app this type was the ability to keep the conversation across conversational channels through voice or messaging.
Without further ado, please enjoy my talk with Brendan.
A showcase for smart glasses opened yesterday, marking November 12 as the day (per the company) that the first world’s smart glasses store opened. North is the Canadian company who develops futuristic HCI products. The company has raised over 135 million from investors, including the Amazon Alexa fund.
Focals is the smart glasses the company is presenting, exclusively available in their stores in Brooklyn, New York and Toronto since yesterday. The custom made glasses features a display that only you can see, I’m not exactly clear about the technology behind this as it’s not expanded in their webpage, I’m very curious and excited to know more about it.
Focals includes visual summaries, smart text and emojis and voice to text. It also comes with a navigation feature with search, turn by turn and the ability to hail an Uber.
The display is controlled with a Loop, a small finger ring with a tiny, four-way joystick that’s included in the purchase, along with a case that doubles as a battery charger. The glasses sync with the user’s Android or Apple iOS device via Bluetooth.
Focals comes with Alexa built-in. According to the showcase page you can Ask Alexa to play music, hear the news, see the weather, control your smart home and more. I’m guessing you can do anything that Alexa allows you to do.
The glasses also comes with a function to pause it all from when you don’t need them:
Technology that’s there when you need it, gone when you don’t – hidden by design.
Form plus function
The glasses comes in stylish designs, a la Warby Parker, maintaining the idea of keeping the technology invisible for only when you need it. The store is also selling the experience in the shopping process. You have to be custom fitted for Focals.It’s crucial to understand how the technology looks and feels,Adam Ketcheson, Chief Marketing Officer of North said to The Bridge: It’s incredibly important for people to get a hands-on experience, especially at our price point. The entire retail model is so people can immersively understand what it is and get the right fit.
Focals will be offered in a variety of styles at $999.
Smart glasses have been emerging and dying for a while now. Google Glass and Intel’s Vaunt both shut down in 2015 and 2018 respectively.
What makes Focals different? The focus on design and style more than the geek outlook of Google Glasses might be a compelling point. Focals are voice activated, but their first selling point is for the technology to be there only when needed, otherwise looking as regular glasses. They are not advertised as a technology, geeky gadget, more as helping companion.
As it often turns out in technology advancements, timing might turn different for North glasses.
Waiting next time I go to NYC to visit the store and try the Focals. Let me know what you think on Twitter @voicefirstlabs or Instagram at voicefirstweekly. I’m Mari, this is VoiceFirst Weekly flash briefing, have a great day and I’ll talk to you all tomorrow. We have an special episode tomorrow with the first human guest in the show. Don’t miss it. See ya.
Survata’s September survey of 2,000 smart speaker owners in the US came with one surprising finding: Apple HomePod owners are more likely to be receptive to audio ads than anyone else.
According to the Survata data, as reported by BusinessInsider:
This shows there is still a large chunk of people who don’t want to hear ads on their smart speakers, suggesting it’ll be an unpopular move if anyone introduces sponsored content any time soon. It’s also unlikely Apple would venture into the sponsored content territory, given it has shied away from targeting ads at users.
Survata market research president Dyna Boen explained that anomaly:
While adoption of Apple HomePod has thus far lagged behind Amazon Echo and Google Home, and thus makes up a smaller percentage of the sample, we still are seeing that these users are saying sponsored content ‘very positively impacts their smart speaker experience’ at a statistically significant level.
As they say, sometimes is better to wait to report on some news. I feel the episode of Bixby on Wednesday could’ve wait until yesterday, when I was going to the Samsung Developer Conference and will get more context and details. If you didn’t listen to yesterday episode for some reason here’s the summary: Samsung opened Bixby for developers, we were part of the developer Beta program and VoiceFirst Weekly now has a capsule. The words game changing were said. Perhaps, I didn’t completely understood my own words on Wednesday. When I was at SDC yesterday, I realized, Bixby is definitely and completely going to change the voice game. You might ask, didn’t you said you developed capsules for Bixby already? Yes, we absolutely did. They are in the Bixby showcase page. The thing is, as I said, Samsung might be rushing it a little to enter the race. As such, some things I saw first hand at SDC were not promoted in the documentation. Maybe they wanted to unveil it during SDC. The truth is I saw this camera putting some makeup on my face, then showing me a list of the same lipsticks or mascaras and then showing a list of places where I could buy them, right there. I saw Bixby recognizing a bottle of wine and showing reviews, prices. Read signs and translate them right there from the camera.
This was all part of Bixby Vision a Samsung S8+ app (some features are only S9+) powered by image recognition. I have read reviews that sometimes Bixby is not as accurate as other smart assistants in speech recognition, but all these features, combined with the ability to learn is a powerful point in favor of Bixby. All of that is now open for developers to interact with.
Among the other announcements in SDC was the coming Marketplace for Bixby capsules in 2019 and the expansion to five new languages in the coming months. I think I’m saying Samsung might have figure out multimodal commerce right there in the faces of everyone. Without AR or VR, just the camera. Certainly Bixby is here to change the game, plus we can not ignore all the phones and appliances Samsung makes. They even have HARMAN, the market leader in connected car solutions.
Bixby is coming to everything.
Here is a video of my interaction with Bixby vision:
Samsung announce yesterday at the Developer Conference in San Francisco that the Bixby platform was open to developers. The Bixby developer Studio was until so far in private beta. Nersa my cofounder at VoiceFirst Labs and I were lucky enough to be in the beta program and contest for the creation of the first capsules (Bixby voice app). I heard that name might change and I’m happy for it, capsule is definitely not a good name for a voice app.
We developed two capsules, with the intention to understand the platform, one for number facts and the other for getting episodes of this show. You heard correctly, VoiceFirst Weekly flash briefing is already available in Bixby, yay!
Experience creating for Bixby
The developer experience still fills a little raw, they clearly need polishing in the platform with the documentation and such, I feel they are in a rush to open up the platform, with the clock ticking.
The capsules were created in a weekend or less, after watching some of the videos provided and then following the documentation. It means it’s relatively straightforward to start creating something for the platform. And the documentation geared towards developers, but we found it pretty useful.
The good, the bad and the ugly
The good part about the platform is the ability to remember an answer or similar answers by instruction, that’s a pretty sweet deal that in short means you don’t need to put all the utterances for an intent. It learns. I really liked that. The way you build the capsule itself It’s also a different way to develop voice apps compared to Alexa or Google Assistant. The IDE was decent enough, it felt smooth.
The bad is the maturity of the platform. Is definitely not at the level of the likes by Amazon or Google.
The ugly, as far as we could see, and we tried, the platform is more focused towards visual interfaces, and it does not reproduce audios. As we were trying to get the audios of the show reproduced directly by Bixby – my expectation was that it was going to be similar to the cards in Google Assistant – we quickly hit a wall. I’m sure they are gonna correct that, but at this point it feels a little outdated already.
Bixby platform and the developer studio might be a game changer in the smart assistants race. The Bixby team have a different, novel idea on how an assistant should behave and I’m expecting the competition to only be good for the voice ecosystem overall.
If this catches up, Samsung will have the “phone advantage”, in their case it’s not only phones but all kind of appliances. The possibility to instantly have their platform on all this devices, without having to convince users to buy a smart speaker. Although they did released the Galaxy Home a couple of months ago, and for sure the whole Bixby ecosystem will work there as well. All in all, exciting times ahead.
This is VoiceFirst weekly flash briefing. My name is Mari, as always you can find me on Twitter as @voicefirstlabs or Instagram as @voicefirstweekly. You have a great day and I’ll talk to you all tomorrow.
P.S We will be at the Developer Conference today during the announcement of the capsules contest winners. Expect live updates on Twitter.