“Multi -modal, multi device, context aware with voice as the first interface, this is what we understand as #VoiceFirst”.
This was one of the first tweets we put out there and has become kind of our motto at VoiceFirst Labs.
Conversations are not only about voice
The next conversational experiences demand a screen, and companies understand this to a deep level. That’s why Google partnered with Lenovo to release the Smart Display. And why Amazon, despite sales number not growing that much keeps pushing for the Echo Show. I talked here a few episodes back, about it’s not voice only, basically advocating to avoid the confusion of voicefirst with conversational experiences that are only driven by voice.
If we are talking multimodalities and multi-devices, which are they and how they come together?
Before being able to interact with our computers and devices via voice, we had the keyboard, then the mouse, then touches interfaces until today that we can talk to our phones, computers and smart speakers. In our newsletter yesterday I shared the concept of VoiceFirst 1.0 which it’s the state I consider we are right now. We’ve seen some brands build voice only experiences, even abide by it like religion. Others are just trying to figure out how they can be part of it or what’s their role and are using everything at their disposal. For me, the future is multimodal the same way keyboards weren’t the only thing and touch won’t be the only. What we are looking at is the ultimate quest: communicate with our devices how we do with each other. And we communicate with everything, with our hands, with our eyes, our body and our voices.
Now let’s get into definitions. In a pure sense, this state so called by me VoiceFirst 1.0 is another modality, different than the keyboard, the mouse and the touch screen, is the voice. For several of the applications, we enjoy today the only modality is voice. I have an Echo spot and I rarely look at its screen, mainly because I don’t need to. There are exceptions, of course, Panda rescue is a good example as well as some games like 6 Swords. And there’s also another kind of applications that provide assistance through voice, voice augmented like the one you can control certain aspects of the game with your voice, but the main interface of the game is not your voice, (StarCraft 2) and if you don’t do it by voice you can still play it if you didn’t have voice assistance. Another example is what Snapchat is doing with its lenses: voice activating features in their application. The future looks like a mixture of the modalities we have today, probably with less keyboard and more touch and voice interfaces all mixed together depending on the context where you are using the app. Will it make sense to ask me to type in a car in a few years? No, the option should exist, but it will be way less used. But if you are in your phone at the bus stop you are no gonna go Text, my lawyer, how is the lawsuit going, you are going to text it.
Modalities and devices
There’s is augmented, activated, added, assisted. And you can combine them with voice, screen, devices. Don’t try to restrict yourself to only one modality or device, use as many as it makes sense for the context of what is trying to solve. I think as voice applications start to solve more complicated problems the need for multimodality will be greater. As users interact with these experiences, the expectations change and the need to multi-device will grow as well. As Dave Itsbitki pointed out at Voice Summit keynote:
Meet your users where they are.
Thank you for listening. You have a great day, this is a daily briefing so we are gonna be on Saturday and Sunday as well, lemme wish you a happy weekend if rather do other activities than listen to podcasts.