Hello everyone, happy Sunday, it’s really sunny and beautiful today, I hope you are have a lovely one wherever you are listening.
In this episode we give a (short) list of terms you need to understand in voice. Let’s begin with the most basic ones:
Smart speakers are devices that can be commanded by voice, with an integrated virtual assistant that can respond to actions after a wake word. Some can also act as control home automation devices, like lights switches or control your TV. We have talked about Alexa, Cortana and Google Assistant here but there are several other smart assistants. Alice, by the Russian search engine parent company Yandex, AliGenie, by Alibaba, Xiaowei of Tencent. A skill is a capability of Alexa. Alexa provides a set of built-in skills and developers can use the Alexa Skills Kit to give Alexa new skills. An action is the equivalent for Google devices.
Wake word, the word that makes an always listening speaker to wake up or listen to answer to user prompts
There a lot of other terms like
Natural language processing
Speech recognition or Automatic speech recognition
Speaker Recognition
Artificial Neural Networks
Natural Language Processing (NLP), Natural Language Understanding and the list goes on. It’s important to at least have heard of these, whether you are voice interface designer, entrepreneur or developer. Most people building skills don’t need to know these by heart, but I certainly suggest to get the basic understanding of the terminology even if you are not building in the subyacent technology. Can’t hurt!!
Natural language processing: technology that extracts the “meaning” of a user’s utterance or typed text. A meaning usually consists of an “Intent” and “Name-Value” pairs. The utterance, “I want to book a flight from Washington, DC to Boston,” has the Intent “Book-a-Flight” with the Name-Value pairs being, “Departure City”=”Washington, DC” and “Arrival City”=”Boston, MA”. An NLP system takes the flat sequence of words, “I want to book a flight from Washington, DC to Boston,” and produces a “meaning structure” (usually a JSON object) that boils down the sequence of words to an Intent and Name-Value pairs. The JSON object delivered can then be inspected by what is often called “middleware software” that can now easily extract the information in the object and execute additional business logic (e.g., retrieve available flight information, or ask for additional missing information, e.g., “What date would you be flying out of Washington, DC?”).
Speech recognition: a machine’s ability to identify spoken words and translate them into a machine-readable format. It’s the base technology in the smart assistants. It all start with recognizing the the user speech, and that’s why rolling out updates in different languages is not an easy task for the companies in it.
https://en.wikipedia.org/wiki/Smart_speaker
https://en.wikipedia.org/wiki/Speech_recognition
https://www.onevoicedata.com/speech-recognition-technology-2017/
https://www.witlingo.com/voice-first-glossary-of-terms/
https://medium.com/@joshdotai/16-voice-control-terms-you-need-to-know-4a79303db08a