Author

About the Author
The ultimate resource in the voice space. Conversational interfaces, voice interfaces, smart speakers and smart assistants, voice strategy, audio branding.

Voice in the workplace

Hello there.

What can voice technology do in the workplace. It’s in our homes and we are finding our way to meet each other and see how we can understand and communicate so the next logical step it’s for smart assistants to move to our workplace. Forbes published an article about the 12 Exciting ways you can use voice-activated technology in the workplace. I thought it was good to explore some of the so I selected 5 to showcase here.

Number 1. Improving customer service support

This technology is leaps and bounds better than anything the telecom industry has produced vis-à-vis interactive voice response (IVR). Access to this technology means more natural customer service help via telephone and being able to handle simple issues.

Number 2. Augmenting Internal Workplace Education

Connect employees to internal resources on demand. This will let developers access extensive documentation or allow anyone to get questions answered about workplace policies, all with a voice-activated system

Number 3 Integration With Business Apps

Businesses can use Amazon’s Alexa for a myriad of reasons, including helping employees work more efficiently from scheduling meetings, keeping track of to-do lists and setting important reminders.

Number 4 Accessing Big Data More Easily

We are on the verge of accessing enterprise data through voice where you can ask a question in basic spoken language and have a platform retrieve the answers — all integrated within enterprise applications. So, within Slack, you could ask, “What are Germany unit sales trends” and immediately receive back the charts and data.

Number 5 Opportunities In Other Environments

I think there are excellent opportunities in other environments where modalities are different and workers aren’t sitting in front of a keyboard and screen all day. On an assembly line, for example, being able to speak commands to equipment would be immensely helpful or perhaps in a hospital environment where a user could ask for necessary information without needing to use their hands.

Thanks for listening!

Does insurance has a voice

I like looking into trends and observing how each industry is adopting these new platforms and which industries are doing it first and which ones are just watching. As Theo Lau put it in an article published in InsurTech:

For insurance industry, adopting voice technology can be viewed as a competitive advantage and an important driver of customer satisfaction.

Back in 2016, Liberty Mutual Insurance became the first insurance carrier to release an Alexa skill, enabling customers to get quotes and access a glossary of insurance jargons. Since then other companies has followed suit like Prudential, Nationwide, MetLife and GEICO. I’ll leave in this episode notes the full list of insurance companies with voice capabilities. In terms of features, it can go from obtaining car insurance quotes, to tips on motivation and happiness, to answers to common insurance questions. Most of them provide clarification of language and terms. This is a great fit for voice applications, as I have said before here anything that is repetitive, you know the work that people answer over and over in a day the same thing to different people, in customer support is the perfect case for voice applications. After all what is assistant for if not to help you with getting home and cars quotes or dental providers.

This is a list of insurance companies with voice capabilities.

The Voice Job Board

We have a big announcement today. As the conversational space in AI, voice applications, and smart speakers platforms grows, the need for finding talent to build the new applications of tomorrow will only grow. Creating content for VoiceFirst Weekly makes us see a lot of job postings, and we follow communities questions and answer questions we get asked all the time. During this time, we came to understand there’s going to be soon enough a need for talent for voice applications not only involving programmers and designers, but all kind of creatives. Even now the people that are working in voice come from a variety of backgrounds. And as such we want to start providing help to both people who want to get into voice technologies and companies looking to hire for voice talent. And with that preamble we are launching today the Voice Job Board. Voice Job Board will allow you to post a job with us if you are a company or a business hiring for voice applications and for those looking to enter the field. You can go to voicefirstweekly.com/jobboard and submit your postings. We will be releasing soon and providing resources for learning and education in conversational applications to help you find the people you need in voice. Stay tuned.

Thank you for listening, you have a great weekend. We’ll talk tomorrow.

Exploring modalities in voice

“Multi -modal, multi device, context aware with voice as the first interface, this is what we understand as #VoiceFirst”.

This was one of the first tweets we put out there and has become kind of our motto at VoiceFirst Labs.

Conversations are not only about voice

The next conversational experiences demand a screen, and companies understand this to a deep level. That’s why Google partnered with Lenovo to release the Smart Display. And why Amazon, despite sales number not growing that much keeps pushing for the Echo Show. I talked here a few episodes back, about it’s not voice only, basically advocating to avoid the confusion of voicefirst with conversational experiences that are only driven by voice.

If we are talking multimodalities and multi-devices, which are they and how they come together?

Before being able to interact with our computers and devices via voice, we had the keyboard, then the mouse, then touches interfaces until today that we can talk to our phones, computers and smart speakers. In our newsletter yesterday I shared the concept of VoiceFirst 1.0 which it’s the state I consider we are right now. We’ve seen some brands build voice only experiences, even abide by it like religion. Others are just trying to figure out how they can be part of it or what’s their role and are using everything at their disposal. For me, the future is multimodal the same way keyboards weren’t the only thing and touch won’t be the only. What we are looking at is the ultimate quest: communicate with our devices how we do with each other. And we communicate with everything, with our hands, with our eyes, our body and our voices.

VoiceFirst 1.0

Now let’s get into definitions. In a pure sense, this state so called by me VoiceFirst 1.0 is another modality, different than the keyboard, the mouse and the touch screen, is the voice. For several of the applications, we enjoy today the only modality is voice. I have an Echo spot and I rarely look at its screen, mainly because I don’t need to. There are exceptions, of course, Panda rescue is a good example as well as some games like 6 Swords. And there’s also another kind of applications that provide assistance through voice, voice augmented like the one you can control certain aspects of the game with your voice, but the main interface of the game is not your voice, (StarCraft 2) and if you don’t do it by voice you can still play it if you didn’t have voice assistance. Another example is what Snapchat is doing with its lenses: voice activating features in their application. The future looks like a mixture of the modalities we have today, probably with less keyboard and more touch and voice interfaces all mixed together depending on the context where you are using the app. Will it make sense to ask me to type in a car in a few years? No, the option should exist, but it will be way less used. But if you are in your phone at the bus stop you are no gonna go Text, my lawyer, how is the lawsuit going, you are going to text it.

Modalities and devices

There’s is augmented, activated, added, assisted. And you can combine them with voice, screen, devices. Don’t try to restrict yourself to only one modality or device, use as many as it makes sense for the context of what is trying to solve. I think as voice applications start to solve more complicated problems the need for multimodality will be greater. As users interact with these experiences, the expectations change and the need to multi-device will grow as well. As Dave Itsbitki pointed out at Voice Summit keynote:

Meet your users where they are.

Thank you for listening. You have a great day, this is a daily briefing so we are gonna be on Saturday and Sunday as well, lemme wish you a happy weekend if rather do other activities than listen to podcasts.

Text to speech services analysis

Browsing Facebook, Roger Kibbe (voicecraft.ai), shout out to Roger, asked for voices text to speech services that have more personality. Just in yesterday’s episode about internationalization, we talked about the customization of voices in different languages and accents with Text to Speech services. But when it’s time to put real emotions it might a little harder to find the voices we want for our apps. Synthetic voices or text to speech services might come in two fundamental ways of consumption: one is download model and the other is streaming. Pricing schemes might be per minutes, per requests in the case of streaming, a combination of both or flat pricing. So here is a compilation of voice services, their offerings and prices:

Text to speech services comparison

Cepstral: It’s a little pricey, so it’s probably for established companies.  Their demo has tons of voices that you can customize by pitch and rate and add effects like Space Robot or Split personality.

Acapela is a Belgium text to speech company. Acapela voices demo, the Acapela Box has a collection of voices described as happy, bad guy, old man or child, among others. Acapela offer a service of voice banking, preserving your own voice as synthetic speech. Other offerings include the creation of voices, a service for companies to differentiate through vocal dimension into their marketing strategy with an identifiable corporate sound. Pricing its based on a credit model where credits correspond to the length of the text – roughly the number of characters – for premium voices. Prices start from 6 Euros for 47 seconds of audio to 600 for 96 minutes of audio.

Voicery the thing I didn’t love about this service is they only have a versions of English voices, given the current state for voice apps, i really prefer a service where i can choose from a range of voices with different languages. On the other hand, voices did sounded quite real. As Acapela, Voicery provides the creation of voices and rights for companies to create their own voice. This is a streaming service and doesn’t offer support for on-device synthesis. They have two pricing packages: the enterprise, where prices are provided on request and the starter package up to 100 request per second at 0.001per character.

SpeechMorphing I heard about SpeechMorphing at Voice Summit and it seems to have a high quality service. They don’t have demo voices, you need to request a demo, so I’ll get back to this one in later episodes. It does shows that you can customize the style of the voices, promising.

Cereproc offers a streaming service, an SDK for developers. Really well crafted voices, the one with more voices demoing, pretty realistic at a reasonable price of 1.000.000 characters for 124 a month up to almost 500 per month. As other services Cereproc offers voice creation and voice cloning as well. Voices available in English, Dutch, French, Italian, Spanish, Portuguese, Japanese and other languages.

Talestreamer: High quality and best relation quality-price with plans of 4/month for 25 000 requests and 16/month for 1,000,000 requests. Talestreamer is the service behind The Magic Door, in my opinion one of the best voice applications out there.

Lastly, Amazon Polly,  Amazon synthesize speech service. You can try it for free customizing it with SSML. It’s a great service for your voice apps or custom audios, but if you need something with more personality, or create your own voice for a brand identity, then you better stick with one of the others.

Winners are Talestreamer it has really good voice at a reasonable price. And Cereproc with voices in a lot of languages providing an SDK plus streaming service at a reasonable price but also you can create more custom voices if needed. As always when choosing a service, it will depend on your needs.  

Do you have a text to speech service to recommend? Have you tried any of these? Shut us a message @voicefirstlabs on twitter. In the episode notes at voicefirstweekly.com/flashbriefing/63 you can find the full transcript of the episode plus the links to each service mentioned.

Before wrapping up, I want to invite you to subscribe to our weekly newsletter. This morning we send our issue number 15! Time flies. Thank you for listening, you have a great day and we will talk tomorrow!