Activation models
Users need a simple and discoverable way to activate or invoke voice interactions on your app. This should align with the aesthetic and theme of the app or game, as well as the interaction model. Below are several examples of common custom activation methods.
A UI affordance is an option within the user interface that can be used for activation such as a mic button, exclamation point above a character’s head, and so on. Common use cases include keyboard dictation or search activities.
Pros
- High discoverability and reliability—it’s easy for the user to find and activate
- Low learning curve—it’s easy for new users to learn
- High input device adaptation — easy to transfer the affordance to a different device
Cons
- Easy to overload the paradigm—it is easy to have too many icons or affordances and cause confusion
- Increase visual density of interface—adds an additional element to the interface
- Not always accessible—if visual, it may not be accessible to someone with visual impairment or when hand-tracking is being used
Examples
Controller click on microphone button
Hand pinch click on UI button
Hand direct touch on UI button
This option is often very effective and natural when used within the context of a game. Some common examples are rubbing a magic lamp, standing in front of a magic mirror, or talking to a non-player character (NPC). The primary use case is voice-driven gameplay.
Pros
- Preserves immersion—it appears as part of the normal user activity and doesn’t disrupt the flow of the game.
- Useful for voice-driven gameplay and novel interactions—it helps to preserve the user’s focus on the game
- Contextual—it’s only available at the time when it can be used, so there’s little chance of accidental activation
Cons
- Discoverability may be low—unless the user knows that this option is available, they may miss it. For example, they may simply walk right by the NPC who might otherwise talk to them.
- Difficulty in conveying state—transparency is important to maintain confidence in an app or game, and this is difficult with a virtual activation.
This option uses the first person perspective and eye tracking as the activation method. It can be combined with other gestures or game elements to provide an immersive experience. For example, gazing at a non-player character and waving to speak or locking onto a target during game play to use a voice command.
Pros
- Accessible when hands are not—this can both improve accessibility and provide greater flexibility in physical interactions with non-player characters. It can be particularly useful when hand-tracking is being used
- Reduces accidental activations when when combined with other inputs—by making the activation multi-step it increases certainty for the activation
- Natural form of interaction with a lower learning curve —it can provide a faster and more intuitive way to activate
Cons
- Accidental activation when used alone—control of head/eye gaze is notoriously difficult to control with precision. This can affect both accuracy when trying to trigger activation and increase the possibility of unintentional activation.
- Needs clear listening affordance—it’s essential to provide a clear indication when this action has successfully activated, because false activation without an indication of system state can be a privacy risk.
Examples
Look at a world locked target
Look at a body locked target
Activation by hand gesture is useful within the context of voice-driven gaming. It can be combined with a head/eye gaze or used separately, as when making a gesture with a wand. Use cases include voice-driven gameplay and multi-sequence activation.
Pros
- Accessible when not using controllers—this can both improve accessibility, and provide greater flexibility in physical interactions with non-player characters.
- Preserves immersion—it appears as part of the normal user activity and doesn’t disrupt the flow of the game.
Cons
- Could be activated by mistake—control of hand gestures is much easier than a head/eye gaze to control with precision. However, for many users, it can be difficult to keep from moving your hands, often randomly. This is less likely to affect accuracy when trying to trigger activation, but increases the possibility of unintentional activation.
- Needs clear listening affordance—it’s essential to provide a clear indication when this action has successfully activated, because false activation without an indication of system state is a real privacy risk.
- Might be constrained by light condition
- It can take time to repeat and it’s also likely to cause fatigue with repeated use
Example
Based on hand distance and gesture to activate voice feature