Power up your apps with cognitive services. An analysis – part 2

Amazon AWS:

Amazon Machine learning services have a pricing model of pay as you use and provides the typical scalability AWS services are known for. The heaviest marketing it has is that it’s the same service that powers Alexa.

  • Amazon Lex, for conversational interfaces and bots development automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text.
  • Amazon Polly is a speech to text service to synthesize speech. Multiple languages and voices to choose from. I have made some demos for Amazon Polly and I like that you can download the audio or stream it directly to S3 from the webpage, without even calling the API. Useful for voice apps. The free tier includes 1 million characters per month,
  • Amazon Rekognition, with a k.https://aws.amazon.com/rekognition/?nc2=h_a1 Identify the objects, people, text, scenes, and activities, as well as detect any inappropriate content. Files to analyze has to be in S3, obviously.
  • Amazon translate https://aws.amazon.com/translate/?p=tile .Translation services. The advertize use case is a curious one at least to me Enable multilingual sentiment analysis of social media content
  • Amazon transcribe provides automatic speech recognition from audio or video in common formats. Transcribe service calls. The service generates the timestamp in the text which can be really useful for captioning and subtitles. You can even try it on the go (https://console.aws.amazon.com/transcribe/home?region=us-east-1#createJob)

Google Speech and text processing services and APIs

  • Cloud Natural Language API provides text analysis. The thing Google can take advantage of it can recognize as entities companies, consumer goods like phones, locations https://cloud.google.com/natural-language/
  • Cloud speech API Speech to text recognition the strong point of Google is that it recognizes 120 languages. It’s separated by model like command and search, phone calls and video. Models optimize for cases like video for transcription as Youtube provides. https://cloud.google.com/speech-to-text/
  • Cloud translation API https://cloud.google.com/translate/

The winner: There is no winner, there never is, I went two episodes to show you trade offs, choose accordingly. Do you need 100 languages, choose Google, do you need content moderation, Microsoft services, do you have other applications with AWS services like S3 etc, then choose Amazon. Above all, I choose whatever the team is more experienced on if the deliverable is soon.

Pricing varies per service and I encourage to evaluate before choosing one service.

Other comparisons

I didn’t include IBM Watson in this one, so I’ll leave other comparisons that might be useful, it includes IBM Watson that I decided not to explore:


About the Author
The ultimate resource in the voice space. Conversational interfaces, voice interfaces, smart speakers and smart assistants, voice strategy, audio branding.

Leave a Reply

Your email address will not be published. Required fields are marked *