flashbriefing

For the summary of every one of our flash briefings

Power up your apps with cognitive services. An analysis – part 1


I was watching some videos of the talks at Voice Summit I couldn’t attend and Noelle’s talk gave an idea for an app. Then I started researching cognitive services, what are the prices what one give me that the other doesn’t. What trade-offs I’m willing to accept. I’m telling you, it’s the real FOMO. What if I choose this but then it doesn’t have everything I want for my app. The fear is real. Why you want to know about cognitive services? Because sometimes, your app it’s not going to be powered by smart assistants. You don’t have to start over, you can leverage cognitive services and start adding speech recognition or sentiment analysis to your current applications.

The comparison

So I decided to do a comparison. This is lacking the real test comparison, which will be to test a set of the APIs and services and try to build a mini-test app with all the contenders, but I cringed at all the time that sounded like. Tic toc, you gotta choose!
The criteria that matters to us it’s even if I get excited by tinkering with technology, there is only so much time and load in your brain you can bargain. I’m familiar with AWS and how it works so it’s easier for me to lean that way, because I can rely on previous similar knowledge. Which should be something to take into account for you as well, whether a company or a solo developer when choosing a service or even a library. I have played with several of Google APIs, and when I say play I mean I haven’t deployed that code to live production environment where the problems are just different.

Having said that the first one I looked was Microsoft. I have been intrigued for a while with their services. I concentrated on Language and Speech. They also have Vision, Knowledge and Search services
The fundamentals:

Speech

  • Speaker recognition. Identify who is speaking given an input audio. They had examples and the comments were being answered, I liked that, good signs. Pricing table will be available at the episode notes. Also an example in Java in Github.
  • Speech to text. The speech to text has a demo for you to try the voices are not that good. They do provide a custom voice service (https://cris.ai/Home/CustomVoice)
  • The one I really liked is automatic speech transcription. With the news that they are adding it OneDrive video and audio files you have a strong use case.
  • Speech translation. The advantage to use across device, in 10 languages. I remember I was at this conference in Peru last year and it the keynote by a Microsoft presenter was automatically translated by Skype. The translation at the time was not exactly accurate, but it got me thinking how far this could go. (https://azure.microsoft.com/en-us/services/cognitive-services/speech-translation/)

Language

  • Text analytics, the main one here is sentiment analysis. (https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/)
  • Translator text. Automatic language detection
  • Bing spell check, well I don’t use Bing a lot.
  • Language understanding with their platform LUIS.
  • Content moderator. This one seems interesting to me, what if a lot of apps are relying on the same moderator rules. Who decides what’s accepted or not. But it’s also one of the only services I know that provide this.

All this services are in preview, so I recommend you have that into account. I didn’t talk about all the pricing because it’s a lot, each individual service has its own pricing. Most models are based per thousands transactions.
The thing about cloud services is that once you start adding services, and the usage grows, you only know what you are paying when it’s too late. In general for playing around or starting applications, free tier should suffice.

This will continue tomorrow!

The Revolution will be synthesized

Synthesized media is a category surging with the latest development of synthetic voices and speech to text technology. It’s not new, but is emerging with the development of Machine learning algorithms. Synthesized media refers to information types synthesized by the computers (e.g. Text, graphics and computer animation). They must be computer-controlled.

Synthesized media today:

We are not that far away from that if we look at current examples of synthesized celebrities:
Lilmiquela on Instagram takes photos with influencers, and has more than 1 million followers, who don’t seem to care about her existence.
Japanese pop star Hatsune Miku appears as an hologram in concert venues. Thousands pay to see her “live”.
Other examples of the advances of synthesized media includes
Lyrebird, a company that created a service that listens to your voice for five minutes and then can sound like you saying anything. Lyrebird published a video almost a year ago of Donald Trump voice synthesized in their social media. There are other companies that offer this service (the banking of your voice) as well and you can check it out in out text-to speech services episode.

Another example, researchers at the University of Washington used AI to synthesize a video of President Barack Obama speaking based on footage from his weekly addresses.

Debunking of synthesized media, but wait

The world federation of science journalists published a tweet earlier in May calling for the development of “Robust processes for debunking of synthesized media”. The article was based on Obama synthesized video and warned against the dangers of deep fakes. But they also highlighted an opportunity for the media and quote:

The media itself is a simulacrum of reality, in which each selection, edit, highlight, or turn of phrase shapes the audience’s interpretation of events. What’s new here is that media-synthesis algorithms further fracture any expectation of authenticity for recorded media while enabling a whole new scale, pervasiveness, potential for personalization, and ease of use for everyone from comedians to spies. Faked videos could upset and alter people’s formation of accurate memories around events.

Will the future be synthesized?

What will happen when nothing seen online can be trusted. What will happen while we ride the wave of authenticity awareness gap?
According to CJR it’s an opportunity for media to ramp up training media forensics techniques. However much of this technology is still away from cheap availability that would make it practical for reporters.

Media forensics is new for me as it’s probably for a lot of you as well. There are some sources to read on the episode notes at voicefirstweekly.com/flashbriefing/82 (yeah this one). It’s about following a rigorous process to ensure what’s published is authentic. However, as I was reading I wonder, there is so much new media being created every day that this process of media forensics by journalists might just not be enough. What if we decide to surrender to the fact that we will have synthesized media? What measures will we take, will we become more tribal with a trusting authority discerning information for the group?
And what if our future news anchors are synthesized videos combined with text to speech services? The Hatsune Miku of news.

Thank you for listening. Remember to subscribe, like, comment and share this episodes. My name is Mari, and you can find me on Twitter as voicefirstlabs and on Instagram @voicefirstweekly. Thank you for listening and you have a great day!

Every platform company should have evangelists. Start your week in voice with this

During the 2018 IFV conference in Berlin last week, Amazon announced that Alexa has 50 000 skills worldwide, 20 000 devices and it’s used by 3500 brands. Big numbers, what does it tell us, though?

Precisely last week, but before the announcement, I did an episode on voice space fragmentation. The main highlight relevant to the latest announcement by Amazon that I mentioned there is that Amazon has the developer ecosystem and no one is close doing the education and evangelist work Alexa team is doing with developers. And that it’s playing with the number of skills worldwide and that will continue to play out. Today, as companies are looking for developer attention, every company should have developer evangelists, no HR or sales. This is not a new concept, it’s just taken to a different level. And the biggest companies get this, but it’s not enough to put some tutorial out there, it’s the careful work of listening and responding to developers concerns, educate, and relate to developers at a human level in social media. I understand not all companies have the resources to pull this, but the future voice space is going to be more fragmented and more companies will try to stand out. The authenticity of your interactions with the users of your platform is going to be (for Amazon is) cornerstone.

Developer attention is the currency today for platforms

Amazon Alexa team is currently doing that better than any other smart assistant platform (I will argue better than anyone else). And it’s driving hundreds of thousands of developers from more than 100 countries, even those where Alexa is not available yet. As a platform the goal should be to attract people to build on top of it. Developer attention is the currency today for platforms, especially worldwide available platforms like smart assistants.

More than 3500 brands are in Alexa

For all the above, is natural that brands are landing their voice strategy on Alexa first. Again, that will continue to be the case while the more developers are in the platform. That includes a tool ecosystem that it’s bigger for Alexa, tools also start providing availability in Alexa first. So if you are a brand looking for a shift of your digital strategy towards voice, Alexa should be starting point for this remarks alone.

Google Assistant gets bilingual support

The other relevant news for you starting this week is the Google Assistant bilingual support. As you guys know I’m one hundred percent behind internationalization and translation as key elements (I even went ahead to name it one of the infinity stones for voice) for voice apps and naturally this is a great news to hear. It also paints the picture for future development and set users expectations to a new level. Once users get use to the feature it’s going to be expected from every assistant. In the video promotion you can also see how it’s remarked the use for learning new languages or teaching kids new languages.

Apple announced their event for September 12

Last but not least, I’m also excited for the Apple event announced for September 12. I just want to see what they are gonna come up with, especially with Siri, the shortcuts app and any other conversational development.

Before wrapping up, I want to thank you from the bottom of my heart for listening. The number of listens of the briefing are growing consistently and we are reaching people from every latitude of the planet. Thank you. Now go ahead and continue to make this happen by sharing this with someone that needs to know about voice platforms.

Videos demonstrating Google Assistant bilingual support:

  • By Google itself
  • And this is a video published by Tobias Goebel, VP at SparkCentral.

Alexa’s Contact and motion sensor APIs use cases

Hello there, Happy Sunday! I hope you are having a nice, relaxing day.

Earlier this week the Alexa team announced the availability of Contact and Motion Sensor APIs and Integration into Alexa.

Why I think this is important? For the cases that are not being espclifically advertised for the feature: like physical impaired people improved access to home utilities. For caregivers of kids or older adults and patients. The contact and motion sensor API also allows to connect to the Routines in Alexa, automatizing even further all this tasks. The future of home automation is more connected every day. So I’ll do an episode in the next coming weeks about the main players working home automation.
Here are the main use cases as featured in the Alexa blog post:
Customers can view their connected sensors in the Alexa App, query their status by asking Alexa, and use sensors to activate Routines to control other connected smart home devices, say special phrases, play music, receive notifications, and much more. Customers can use their sensors to automate a wide variety of custom-built Routines, such as:

  • When motion is detected in the living room, Alexa can turn on the lights, and then turn off the lights after 30 minutes with no motion detected
  • A motion sensor within a Wi-Fi camera can turn on a light and send a notification to your phone
  • You can ask Alexa if a door or window is open before arming the home security system
  • A front door contact sensor can activate Alexa to announce that the front door is open
  • A pantry door contact sensor can turn the pantry lights on and off when the door opens and close

That’s it for today.
Remember to subscribe, like, comment and share this episodes. My name is Mari, and you can find me on Twitter as voicefirstlabs and on Instagram @voicefirstweekly. Thank you for listening and you have a great day!

Automatic transcription to video and audio files stored in OneDrive

Among Microsoft announcements this week is a number of new features coming to OneDrive for Business and SharePoint that will use AI and machine learning technologies to manage and collaborate on content stored in those services.
Starting “later this year,” Microsoft will be adding automated transcription to video and audio files stored in OneDrive and SharePoint. This transcription will use the same technology that Microsoft uses in its Microsoft Stream business video service. OneDrive and SharePoint video and audio files will become fully searchable thanks to these transcription services.

Microsoft is providing developers with natural language processing tools and cognitive services

One of the things I noticed it’s that even if Cortana is not making a lot of noise in the smart assistant space, aside for the integration with Alexa, Microsoft has been providing great developer tools for language understanding and processing.
Microsoft has an enterprise focus, and the work of automatic transcription can be a real deal for the content needs of many organizations, let’s say they later add translation to 3 more languages and it’s a game changer for collaboration in the enterprise.

That’s it for today
Remember to subscribe, like, comment and share this episodes. My name is Mari, and you can find me on Twitter as voicefirstlabs and on Instagram @voicefirstweekly. Thank you for listening and you have a great day!