Episode Archives

Will Voice Technology be polluted with ads

Hello, happy Sunday.
Last week Ben Smith from Voice Entrepreneur asked on Instagram: Will Voice Technology be polluted with ads? 

My response was: Our ears are way more sensitive than our eyes. If you open lots of pages with ads today and imagine that with voice it’s unbearable. That’s why ads in voice will be different and the field has to be smarter on voice platforms. And that’s why branding made right it’s gonna be so important and hard for voice platforms.
I mentioned in the episode about why branding in voice is a necessary challenge a quote that said voice is going to be the biggest challenge for brands since the internet.
What do you think? I woke up like this!

Have a nice Sunday and a productive start of week, we’ll talk tomorrow!

How to get your website picked up by Alexa and Siri when a person ask a question


Do you know how Alexa and Siri decides which website to pull when they are asked a question?
Researches from digital strategy agency iProspect shine their opinion on how it works according to a study they conducted, The Future is voice activated.
This are the items they mentioned a website should have to be chosen by the smart assistants Alexa and Siri:
The first one is that the site must be popular online already. For now it seems that being popular keeps giving and giving.
The site should have a solid reputation. The quality of smart assistants answers it’s very important for the platforms to have authority and avoid user turnover for bad responses. Sites should have reputable links.
The site’s optimized for conversational queries. Better be prepared to respond to longer queries in conversational format. One of the first steps you can take it’s implement schema in your pages. Google has some documentation on how to do it and the website is schema.org.

It loads quickly. Conversations have this immediacy, people are expecting responses faster than traditional web pages, where there is usually visual cue on the loading process. If your site content doesn’t change often, cache it. Be careful, as Phil Karlton said:

There are only two hard things in Computer Science: cache invalidation and naming things.

Wanted to put this quote there today.

Last one is The site’s content is the same language the user speaks.
We don’t know for sure how Alexa or Siri picks a website to pull. Definitely this is a nice guidelines either way for your sites. Note that in this list, the biggest change is optimization for conversational queries and there is nothing about ranking being said, but surely there has to be some kind of way to prioritize websites, but also skills and actions. As we move towards a more voice activated world, it will become increasingly important to have content optimized for voiced queries.
Find the study in the episode notes at voicefirstweekly.com/flashbriefing/86.

Thank you for listening. What topics would you like to see featured more? Spread the love and subscribe. Feeling excited every day more. You have a great day, and we’ll talk tomorrow!

China strong stance on voice technology and conversational interfaces

Today, I want to focus on the strong stance Chinese companies have on voice technology and smart assistants and how they are driving a low cost smart assistant market.
Former President of Google China, Kai-Fu Lee, said at the Disrupt 2018 conference in San Francisco this week, that his former company doesn’t have very good odds of success if it decides to re-enter the Chinese market. Lee suggested that Google’s current management doesn’t have the right stuff to compete in China’s growing and rough-and-tumble Internet markets. Lee also said it would be hard to find employees as Chinese companies are innovating and new grads prefer to work for Chinese companies. Article from Business Insider.

China is also a market that has proven hard for US companies and neither Amazon Echo or Google Home have penetrated China.

Smart voice is one of the Chinese government’s four main focus areas in its first wave of AI applications throughout the country. The other 3 are healthcare, smart cities, and autonomous vehicles here.)
And the biggest companies are all in.

Low cost smart speakers

Alibaba sold its Tmall Genie smart speakers for $15 in China on Single’s Day, the country’s annual shopping big day on November 11. Baidu recently cut the price of one of its smart speakers in China from $39 to $14. This prices are pushing out smaller companies of the competition and driving high volume usage.
Alibaba’s customer service chatbot got upgraded ahead of the coming November 11. The bot it’s used by more than 600 000 merchants in its e-commerce sites.

Business models blurring lines

These companies are also changing business models by partnering with US tech companies to use their hardware with their assistants outside of China and partnering with Chinese voice AI startups, or developing conversational AI software in-house to sell within China, where US tech companies face strict restrictions.
Emerging players, partnerships, business models, and a low-cost, high-volume smart speaker market it’s what’s driving China voice technology.

Thank you for listening. Remember to subscribe, like, comment and share this episodes. My name is Mari, and you can find me on Twitter as voicefirstlabs and on Instagram @voicefirstweekly. Thank you for listening and you have a great day!

Power up your apps with cognitive services. An analysis – part 2

Amazon AWS:

Amazon Machine learning services have a pricing model of pay as you use and provides the typical scalability AWS services are known for. The heaviest marketing it has is that it’s the same service that powers Alexa.

  • Amazon Lex, for conversational interfaces and bots development automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text.
  • Amazon Polly is a speech to text service to synthesize speech. Multiple languages and voices to choose from. I have made some demos for Amazon Polly and I like that you can download the audio or stream it directly to S3 from the webpage, without even calling the API. Useful for voice apps. The free tier includes 1 million characters per month,
  • Amazon Rekognition, with a k.https://aws.amazon.com/rekognition/?nc2=h_a1 Identify the objects, people, text, scenes, and activities, as well as detect any inappropriate content. Files to analyze has to be in S3, obviously.
  • Amazon translate https://aws.amazon.com/translate/?p=tile .Translation services. The advertize use case is a curious one at least to me Enable multilingual sentiment analysis of social media content
  • Amazon transcribe provides automatic speech recognition from audio or video in common formats. Transcribe service calls. The service generates the timestamp in the text which can be really useful for captioning and subtitles. You can even try it on the go (https://console.aws.amazon.com/transcribe/home?region=us-east-1#createJob)

Google Speech and text processing services and APIs

  • Cloud Natural Language API provides text analysis. The thing Google can take advantage of it can recognize as entities companies, consumer goods like phones, locations https://cloud.google.com/natural-language/
  • Cloud speech API Speech to text recognition the strong point of Google is that it recognizes 120 languages. It’s separated by model like command and search, phone calls and video. Models optimize for cases like video for transcription as Youtube provides. https://cloud.google.com/speech-to-text/
  • Cloud translation API https://cloud.google.com/translate/

The winner: There is no winner, there never is, I went two episodes to show you trade offs, choose accordingly. Do you need 100 languages, choose Google, do you need content moderation, Microsoft services, do you have other applications with AWS services like S3 etc, then choose Amazon. Above all, I choose whatever the team is more experienced on if the deliverable is soon.

Pricing varies per service and I encourage to evaluate before choosing one service.

Other comparisons

I didn’t include IBM Watson in this one, so I’ll leave other comparisons that might be useful, it includes IBM Watson that I decided not to explore:

https://softarex.com/blog/cloud-cognitive-services-analysis/
https://www.altexsoft.com/blog/datascience/comparing-machine-learning-as-a-service-amazon-microsoft-azure-google-cloud-ai-ibm-watson/

Power up your apps with cognitive services. An analysis – part 1


I was watching some videos of the talks at Voice Summit I couldn’t attend and Noelle’s talk gave an idea for an app. Then I started researching cognitive services, what are the prices what one give me that the other doesn’t. What trade-offs I’m willing to accept. I’m telling you, it’s the real FOMO. What if I choose this but then it doesn’t have everything I want for my app. The fear is real. Why you want to know about cognitive services? Because sometimes, your app it’s not going to be powered by smart assistants. You don’t have to start over, you can leverage cognitive services and start adding speech recognition or sentiment analysis to your current applications.

The comparison

So I decided to do a comparison. This is lacking the real test comparison, which will be to test a set of the APIs and services and try to build a mini-test app with all the contenders, but I cringed at all the time that sounded like. Tic toc, you gotta choose!
The criteria that matters to us it’s even if I get excited by tinkering with technology, there is only so much time and load in your brain you can bargain. I’m familiar with AWS and how it works so it’s easier for me to lean that way, because I can rely on previous similar knowledge. Which should be something to take into account for you as well, whether a company or a solo developer when choosing a service or even a library. I have played with several of Google APIs, and when I say play I mean I haven’t deployed that code to live production environment where the problems are just different.

Having said that the first one I looked was Microsoft. I have been intrigued for a while with their services. I concentrated on Language and Speech. They also have Vision, Knowledge and Search services
The fundamentals:

Speech

  • Speaker recognition. Identify who is speaking given an input audio. They had examples and the comments were being answered, I liked that, good signs. Pricing table will be available at the episode notes. Also an example in Java in Github.
  • Speech to text. The speech to text has a demo for you to try the voices are not that good. They do provide a custom voice service (https://cris.ai/Home/CustomVoice)
  • The one I really liked is automatic speech transcription. With the news that they are adding it OneDrive video and audio files you have a strong use case.
  • Speech translation. The advantage to use across device, in 10 languages. I remember I was at this conference in Peru last year and it the keynote by a Microsoft presenter was automatically translated by Skype. The translation at the time was not exactly accurate, but it got me thinking how far this could go. (https://azure.microsoft.com/en-us/services/cognitive-services/speech-translation/)

Language

  • Text analytics, the main one here is sentiment analysis. (https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/)
  • Translator text. Automatic language detection
  • Bing spell check, well I don’t use Bing a lot.
  • Language understanding with their platform LUIS.
  • Content moderator. This one seems interesting to me, what if a lot of apps are relying on the same moderator rules. Who decides what’s accepted or not. But it’s also one of the only services I know that provide this.

All this services are in preview, so I recommend you have that into account. I didn’t talk about all the pricing because it’s a lot, each individual service has its own pricing. Most models are based per thousands transactions.
The thing about cloud services is that once you start adding services, and the usage grows, you only know what you are paying when it’s too late. In general for playing around or starting applications, free tier should suffice.

This will continue tomorrow!