I was watching some videos of the talks at Voice Summit I couldn’t attend and Noelle’s talk gave an idea for an app. Then I started researching cognitive services, what are the prices what one give me that the other doesn’t. What trade-offs I’m willing to accept. I’m telling you, it’s the real FOMO. What if I choose this but then it doesn’t have everything I want for my app. The fear is real. Why you want to know about cognitive services? Because sometimes, your app it’s not going to be powered by smart assistants. You don’t have to start over, you can leverage cognitive services and start adding speech recognition or sentiment analysis to your current applications.
The comparison
So I decided to do a comparison. This is lacking the real test comparison, which will be to test a set of the APIs and services and try to build a mini-test app with all the contenders, but I cringed at all the time that sounded like. Tic toc, you gotta choose!
The criteria that matters to us it’s even if I get excited by tinkering with technology, there is only so much time and load in your brain you can bargain. I’m familiar with AWS and how it works so it’s easier for me to lean that way, because I can rely on previous similar knowledge. Which should be something to take into account for you as well, whether a company or a solo developer when choosing a service or even a library. I have played with several of Google APIs, and when I say play I mean I haven’t deployed that code to live production environment where the problems are just different.
Having said that the first one I looked was Microsoft. I have been intrigued for a while with their services. I concentrated on Language and Speech. They also have Vision, Knowledge and Search services
The fundamentals:
Speech
- Speaker recognition. Identify who is speaking given an input audio. They had examples and the comments were being answered, I liked that, good signs. Pricing table will be available at the episode notes. Also an example in Java in Github.
- Speech to text. The speech to text has a demo for you to try the voices are not that good. They do provide a custom voice service (https://cris.ai/Home/CustomVoice)
- The one I really liked is automatic speech transcription. With the news that they are adding it OneDrive video and audio files you have a strong use case.
- Speech translation. The advantage to use across device, in 10 languages. I remember I was at this conference in Peru last year and it the keynote by a Microsoft presenter was automatically translated by Skype. The translation at the time was not exactly accurate, but it got me thinking how far this could go. (https://azure.microsoft.com/en-us/services/cognitive-services/speech-translation/)
Language
- Text analytics, the main one here is sentiment analysis. (https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/)
- Translator text. Automatic language detection
- Bing spell check, well I don’t use Bing a lot.
- Language understanding with their platform LUIS.
- Content moderator. This one seems interesting to me, what if a lot of apps are relying on the same moderator rules. Who decides what’s accepted or not. But it’s also one of the only services I know that provide this.
All this services are in preview, so I recommend you have that into account. I didn’t talk about all the pricing because it’s a lot, each individual service has its own pricing. Most models are based per thousands transactions.
The thing about cloud services is that once you start adding services, and the usage grows, you only know what you are paying when it’s too late. In general for playing around or starting applications, free tier should suffice.
This will continue tomorrow!