VOICE Summit 2018 Day One Recap

Attending VOICE Summit or just wish you had? My publisher, Manning Publications, is offering 40% off my book Voice Applications for Alexa and Google Assistant as well as any other book they sell. Use code ctwvoice18 when you check out.

Day one of the 2018 VOICE Summit in Newark, New Jersey is coming to a close. The audience for the first day has been a mix of those just coming around to voice, and those who have been around for a while, though not many developers overall—this is a business and user experience oriented conference so far. I took today to meet with people and attend sessions. My notes on what I’m seeing as trends will come following day three, but in the interim here’s what I heard from the sessions.

Keynote with Dave Ibitski

This year’s conference is at New Jersey Institute of Technology, which dovetailed well with the keynote speaker, Dave Ibitski, who happened to be an alum. Ibitski is the Chief Evangelist for Alexa, and a face you’ll see at nearly any large conference that touches on voice. His keynote didn’t announce anything new for Alexa, but gave some stats and background on where Alexa stands today. There was some good news for Amazon, as you’d expect. Consumers purchased more than a million smart devices in the US for Prime Day. There are over 45 thousand skills.

However, there were also some numbers that looked rosy, but only when viewed from the right angle. Skill engagement has grown 50% year over year. Ibitski didn’t say whether this was on a per-user basis or overall. If the average user is using skills 50% more than last year, this is great! That’s not how I read the number, though, which implies that skill usage is growing less than Alexa usage. This relates to the second figure showing that four out of every five Alexa users have used Alexa skills. 20% of Alexa users not having ever used a third-party skill shows room for improvement.

Ibitski’s keynote played broad to the audience, but even for those who have been around for a while, seeing Amazon’s messaging change over time is interesting. This go-around focused more on monetization; to be expected with the developments in 2018 on this front. There also seemed to be more focus in this presentation about how to determine what to build. I found his framework of low utility to high utility noteworthy, with the scale going from high to low:

Browsing
Telling
Searching (specifically scoped searches)
Doing

Finally, Ibitski gave a thought into what’s the future of Alexa. No announcements or hints of new features, but instead a recap of the recent past, with Gadgets, Alexa For Business, and Blueprints.

It’s All Been Said Before

Next up was Phillip Hunter of Pulse Labs. He’s worked in voice for 20 years, and is tired of seeing voice-first designers make the same mistakes that the IVR space already made.

He started off with what voice is not. Voice is not the web, and voice is not mobile. Voice is not a “natural” conversation, and voice is not a new kind of vocal paradigm altogether. If you’ve been working with voice for more than a few months, you might find these statements obvious, but think back to your first days. You probably thought that a voice UI needed to be as natural as possible. Perhaps you thought you could take the information from your website and put it directly into a skill. Beginners need to hear these precepts from people deep with experience.

Even though a voice UI is not meant to completely mimic a real conversation, it still needs to let the user speak naturally. In Hunter’s words, “let people talk the way they talk.” When we talk about coming to where the user is—a major theme of the first day—this is what it means. You can’t train users in a voice interface, or provide tooltips to guide them. Voice application developers need to take on the burden and the work to save time for the users. For a few years in the early 2000s, a website competed only against other websites. Then there was some mobile app against website competition. Voice applications are competing against both mobile apps and websites. If a voice experience isn’t easier for the user, the user has other options.

Hunter said the “encourage[ment] and reward[ing] of spontaneity” is missing from many voice applications. The user’s path to success will be winding, and voice designers need to accept that users are going to want to go off-path. Hunter views the goals of a good voice UX to be “orchestration, anticipation, and predictability.” Predictability is not saying the same phrases repeatedly, but is a user’s confidence that spontaneity will be handled.

Quick Tip: Stop saying “welcome to” in voice applications. Who does that in real life? No one. Why are voice apps doing it? Stop it.

Surviving Shift to Voice First

Doug Robinson, CEO of Fresh Digital Group, spoke from an agency perspective. Fresh Digital Group has built skills for UNICEF and the Emmy Awards, and his experience on that side informed his talk. He’s not a designer or a developer, for whom business metrics are secondary. Engagement and the “worth” of that to a “brand” is paramount. He believes that brands are building what’s easiest to create and deploy, without wondering if they should be building that. They’re in the “can we” phase, rather than the “should we” phase.

Quick Tip: Having trouble retaining users? Aren’t we all. Richardson says that 3–6% is a good voice app retention rate.

My Three Year Old Broke Your App

Jeremy Wilken gave the session most filled with actionable steps that I saw on the first day.

400 million children under the age of fifteen are going to access voice applications by 2020, according to Wilken. This creates an opportunity, certainly. Problems, though, will come along. Kids’ voices are thinner with less vocal data than adults. They have a smaller vocabulary with imperfect grammar. Their attention is scattered and they are “emotional firecrackers.” They bring problems that exist with adults, but amplify them. (Certainly no adult has ever cursed at Alexa…)

One example that broke Wilken’s heart was his daughter’s sadness that “Google didn’t hear [her].” In truth, she didn’t see the visual indicators or she didn’t understand their meaning. She cheered up again as soon the Google Home responded, but this situation could have been smoother had Google provided an immediate, verbal acknowledgement. “I hear you, and I’m looking into it.” Visuals need to be additive in a voice-first setting. Want to see what this is like? Add intentional latency into your voice application to see what the user experience is like. Then, post-launch, look at the logs for long response times and see opportunities for optimization.

Children and adults are sometimes going to speak in “non-standard” ways. This is important to me, as I grew up around Southern accents, and voice assistants aren’t good with the syrupy speed and pronunciation. Certainly, my French accent is horribly non-standard. Expecting STT to understand these perfectly at this point in technology is unreasonable. Expecting people to speak perfectly is, too, unreasonable. Test a voice app using incorrect words. Bring in different voices and accents (try Applause if you want to do this at scale). Jump in the shower and speak to the device. Change the voice app based on what you’ve learned.

For the content, make sure your app can handle small talk. It’s a small addition, one that users only half-expect and are fully-disappointed when it’s not there. Make sure the app offers help when it’s needed (increase the level as necessary) and make sure a user can stop or exit at any time. Test all of this out by continuing to chat with the app until it gives up or breaks. It will break at some point, but when is that? Don’t test just a single thing at a time, but really go deep. Jumble up the paths. This is difficult for the designer or developer, but try to be silly about it.

Overall, Wilken’s talk was an exhortation to try to break your voice application. Consider the worst cases, and the weak points.

Quick Tip: Consider capturing a “feedback” intent, where users can provide feedback about their experiences.