voice phone

Smart speakers are here, and they are not only changing how we interact with technology, but also how we get things done. As everyday users increasingly rely on them and businesses ramp up to use them, Voice Apps are now the new frontier for app design. Have you wondered what it would take to design an app that runs on them?

People use now using their voice and smart speakers for just about everything - from listening to music, catching up on the news, checking the weather, and setting daily reminders. By 2021, forecasts show that the use of smart speakers will top tablets.

Voice interactions go beyond just smart speakers – with most phones having built-in Voice capabilities, users are starting to use voice applications there as well.

Part of the reason behind this growth in voice usage – is that Voice is a ‘natural’ interface. In other words, we as humans know how to talk very soon after we are born. We see this in that we generally prefer using voice to interact with others, as opposed to text messages. Voice as an interface can be as intuitive as a typical conversation. Voice, however, does come with its challenges…

A Little Context

The first smart-phone apps were clunky, awkward, and hard to use. App builders had not yet figured out how to properly expose the capabilities of computers into such tiny devices.

When the iPhone launched, app builders learned how to take features and style it for the smart-phone screen. And end-users rewarded these well-designed apps.

Even though web design and mobile app design are different, they do have a lot of things in common - from buttons to text fields and dropdown menus to name a few. Voice apps have similar parallels, designing them is different, but they do have a lot in common with web and mobile apps.

A Unique Medium

It is worth starting with noticing that in many ways, Voice is unique as an interaction medium.

Voice Apps (and conversations in general) are temporal. When you speak - and therefore provide a voice-based request, you need to wait for the response to finish before moving on with your task. Users cannot speed through the apps. The temporal property means that users can quickly become bored or impatient.

Voice Apps are also hard to use privately. Since requests to apps need to be spoken and often the response is said aloud, interactions by default become public.

Lastly, voice is pretty ineffective with many types of information - from things like lists to showing relationships between items. Anything involving the use of visuals like graphs or tables can be challenging, communicating purely with voice.

Opportunities and Challenges

The Uniqueness of Voice does come with incredible opportunities. The most obvious one is that Voice Apps take advantage of an interaction paradigm that we, as humans, are very familiar with - even before being exposed to any technology. Voice interfaces being intuitive allows more people to use them easily and lowers training costs when getting them adopted. However, what gets often forgotten is that Voice Apps still need to be designed so that they can be intuitive, i.e., they need to support usage in many situations.

To put it simply, an intuitive Voice App for a Pizzeria will likely need to support many requests including “send me a pizza”, “which of your pizza’s are the healthiest”, and “how long would it take for a pizza to arrive here”. While it is not necessary to support every request, picking which requests would be supported and especially what will not be supported needs to be done carefully.

The temporal nature of voice means that people are generally comfortable receiving and providing short responses. These quick responses allow for increased usage but also means that smart speaker users are often multi-tasking when leveraging Voice App and therefore need functionality that does not get in the way of what they are doing. The need for efficient conversations also means that users often demand Voice Apps being able to grow with them so that they can be used quickly.

The inability of Voice Apps to be used privately does result in challenges but also calls for considering the usage of Voice Apps in a similar manner that private conversations happen today. In addition, the public nature of voice inspires looking into collaborative and multi-user scenarios when thinking about Voice Apps.

Similarities with all Apps

While there are a lot of opportunities and challenges when considering voice as a medium, it is, however, worth remembering that there are a lot of similarities.

To start with, creating a truly great app on any platform, be it smart-speakers or smart-phones, means having designers work closely with developers. With Voice Apps, designers often end up taking the lead on making the conversation feel great and balance those needs with the actions that the user needs to take. At the same time, developers take the lead in building out the logic for the App.

In addition, most fundamentals of app interaction remain the same with Voice. What we call events for mobile and web apps, are known as intents with Voice Apps. It is just how those interactions are done that have changed. Clicking and tapping are intuitive ways to interact with an App’s features, and this is similar to utterances spoken by people (i.e., intents) for Voice Apps.

Whether on the web, mobile, or voice based - all apps are simply designed to help a user complete an action of some sort. To do this, they gather, process, and provide data - often using what is referred to as a ‘form’ with each piece of value being accessed using a ‘control’ of some sort. This core mechanism of interacting with apps remains unchanged regardless of where users are pushing buttons or providing voice commands.

At the end of the day - all apps operate within the fundamentals of basic app design. They need to have intuitive nesting of these controls, which can take the forms of menus, sub-menus, tabs, and accordions. So while the organization of app’s functions remains the same with Voice Apps, what changes is how the user navigates through them.

From here

If the differences between a Voice App and other types of apps scares you, remember that the fundamentals are similar in many ways. With the growing usage of Voice Apps, it is a good idea to get started building them. Doing so will allow you to internalize the differences and the strengths of the medium.

There are multiple ways to start. If you are not technically inclined or new to app development, Alexa Skills Blueprints is an excellent place to start. If you are experienced technically, then using Violet, an open-source application-framework for building Voice Apps will suit you just fine.

The nature of Voice as a computing interface and the recent technological advances makes it an exciting opportunity.

What do you think? I would love to hear your thoughts.

Image Credit: Voice by Eucalyp from the Noun Project.
Updated: 5th Jan 2020 - General improvements.

I am writing new articles once every 2-3 months. Interested in getting notified? Sign up here.