Hands
Discover how hands can be used to enhance experiences. This page focuses on hands as an input method for immersive experiences. It covers the benefits and challenges of this input method, how it works, ergonomic suggestions, design principles and dos and don’ts.
Humans use their hands to interact effectively with their surroundings, facilitating tasks such as tool usage, communication, and tactile exploration.
Similarly, in immersive experiences, hands are a vital input method. They facilitate natural interaction by reflecting their real-world role and extending it with gesture-based indirect control of virtual content.
These are the different parts, characteristics and frequently used terms to be familiar with:
Joints | Joints in the hand provide flexibility and mobility. In XR these are referenced in code for determining interactions. See image above. |
Wrist | The joint connecting the hand with the forearm, often tracked in code for hand position. |
Fingers & thumb | The five digits of the human hand, consisting of four fingers and one opposable thumb, are essential for performing tactile interactions. |
This section provides insights into Hand Tracking (HT) technology. Exploring the details of how it functions, with a special focus on calibration and accuracy, and the inherent technological limitations, as well as the measures you can take to mitigate them.
The Hand Tracking algorithm detects and tracks the hands using the cameras on the Meta Quest headset.
Hand tracking technology utilizes the sensors and cameras on the Meta Quest headset (HMD) to capture hand and finger positions and movements. This data is processed by software using algorithms, often involving machine learning and computer vision, to recognize various hand positions and movements. The system then interprets these movements as specific commands or actions, allowing user interaction with the virtual or blended environment. Various factors can influence accuracy, including hand location relative to cameras, lighting conditions, occlusions, and more. For further details, refer to the section
limitations and mitigations. Understanding these factors is crucial for designing an immersive experience.
Hand input accuracy is achieved by precisely tracking and recognizing hand movements, where the focus is on the pose of the hand (position and orientation) across frames. Abrupt or shaky movements are smoothed to achieve a lifelike motion.
Calibration involves the device recognizing the hand and measuring the distance between specific points to set the hand’s scale, which is essential for accurately sizing the hand mesh. To enhance hand tracking quality in various conditions, the model is trained on a diverse dataset that encompasses different lighting environments, hand shapes, skin tones, distances, poses, and orientations.
Limitations and mitigations
This section examines the inherent limitations and challenges that impact performance and usability and offers strategies for mitigation through design or code improvements.
The headset’s sensors track within a specific area known as the tracking volume. Hands outside this area, such as resting by the thighs or placed too close to the device like on the cheek, will not be detected. The hand-tracking volume exceeds the display field of view, allowing for tracking even when hands are not visible to the user. The tracking volume varies depending on the hardware used.
Mitigation - Code: To compensate for hands moving outside the camera’s field of view, body tracking techniques are employed to infer the positions of the hands.
Mitigation - Design: To ensure that all relevant user motions are accurately tracked, it is recommended to design interactions that keep users’ movements within the tracking volume, avoiding the need for them to reach beyond it. For more guidance, refer to the
comfort page.
Sensors play a vital role in tracking hand movements in virtual environments. They perform best when they have a clear view of the user’s hands. However, when parts of the hand are not visible to the sensors, the system attempts to estimate their positions. If significant portions of the hand are obscured, the virtual representation of the hand will disappear. The hand pose prediction model provides two key outputs: the estimated hand pose and the level of confidence in that prediction. If the confidence level falls below a certain threshold, the system will not display the hand. Obstacles that block the sensors’ view can include elements from the surrounding environment or the user’s own body, known as Self-Occlusions
Occlusions happen when hands are behind obstacles or interact with physical objects. Tracking accuracy can be affected by common obstructions such as long sleeves, hands in pockets, large rings, or black bandaids.
Self-Occlusions include scenarios where fingers are covered by other fingers, fingers point outward obscuring them from the headset-mounted display (HMD) cameras, or when hands overlap. These factors collectively influence the effectiveness of hand tracking technology.
Mitigation - Design: Aim to design interactions with minimal hand overlap; for example, having the index finger interact with a wrist button or palm menu on the other hand is a minimal overlap. The more one hand covers the other, the lower the accuracy. Additionally, interactions that can be performed with just one hand are always a good choice, as they not only enhance accessibility but also require less physical effort. This approach allows the hand tracking engine to effectively track both hands independently.
The quality and intensity of light can impact the system’s ability to accurately detect hand movements. Hand tracking has different lighting requirements than inside-out (head) tracking. In some situations, this could result in functional differences between head tracking and hand tracking, where one may work while the other has stopped functioning.
Mitigation - Design: Immersive experience headsets adjust to different light conditions. For optimal functionality, advise users to be aware of their environment’s lighting, perhaps by including a note when launching an application.
Rapid or erratic hand movements can challenge the system’s tracking accuracy. For example, when throwing a virtual object, the hand moves quickly, and the camera may not immediately capture the exact point of release. Nevertheless, the Move Fast app successfully demonstrates that playing fast-paced rhythm games using hand gestures is indeed possible.
Mitigation - Design: To improve tracking accuracy with fast-paced hand movements, consider these strategies:
- Centralize targets and encourage keeping hands at a moderate distance from the headset.
- Maintain a predictable target speed.
- Note that hand movements directed away from the player (such as punches) are tracked more effectively than movements parallel to the view (such as chops).
- Use visual cues to guide users in maintaining appropriate hand speeds and positions. For example, visually slowing down hands or targets can encourage users to adapt their speed.
By integrating these design elements, it is possible to significantly enhance the tracking performance of hand movements, particularly in fast-paced applications.
Tracking accuracy may be influenced by individual user traits. Factors such as long fingernails, specific types of nail polish, tattoos on the hands, accessories like rings, long sleeves, and diverse hand sizes and shapes can all impact performance.
Mitigation - Design: Enhance user interaction by offering a variety of input modalities and interaction choices. For instance, while long nails may hinder poke interactions, pinch actions using ray casting could be more effective. Alternatively, using a controller can serve as a reliable fallback option. For more information, please refer to the section on
accessibility.
This section offers guidance on hand-based interaction techniques. Discover input primitives, understand design principles, consider ergonomic factors, and learn the essential dos and don’ts.
Discover various input capabilities and interaction methods utilizing hands as input modality:
Targeting | Target objects using the hands in two ways: directly, similar to real-life, by touching an interactable or indirectly using a ray cast. |
Selection | Letting the user choose or activate that interactable with the hand by performing for example a tap gesture. |
Pose | A static moment where the joints of the hand are coordinated into a pose/orientation to communicate a specific action or command. Examples are: pinch, grab and custom poses. |
Gesture | Refers to a specific movement (pose sequences) made by the hand. For example, a swipe gesture, waving the hand in a direction. |
Gate | Gating transitions the hand from an "idle" state to an "active" state. These transitions are performed using a pose or gesture. |
For a comprehensive overview of all input modalities and their corresponding input primitives, please visit the following page:
input primitivesThis section explores the fundamental concepts that shape intuitive and user-friendly interactions for the hands input modality.
Usability | The presence of hands sets user expectations to at least mimic real-world capabilities and potentially grant superpowers. Users anticipate the ability to pick up objects, push buttons, and execute actions beyond the bounds of reality. |
Accessibility | Hand-based inputs should be designed with accessibility in mind. For users with limited hand mobility, alternative input methods or adaptive controllers should be provided. For further guidance, refer to the [accessibility](/design/accessibility) page. |
Multimodal | Multimodal interactions refer to providing multiple input modalities, such as controllers, hands, voice, and more, for the user to interact. By incorporating multiple modalities into an application, users can enjoy the experience by naturally choosing the interaction method that serves them best in the given moment. This creates a more seamless and enjoyable experience, as well as increased accessibility. |
Affordances | It's crucial to provide distinct affordances, feedback, and signifiers. For instance, when a user interacts with an object, such as picking it up, they should receive immediate and clear visual and auditory cues confirming the successful completion of the action. This enhances the user's understanding and interaction within the environment. |
Remember that hands aren’t controllers | It’s very tempting to simply adapt existing interactions from input devices like the controllers, and apply them to hand tracking. But that process will limit one to already-charted territory, and may lead to interactions that would feel better with controllers while missing out on the benefits of hands. Instead, focus on the unique strengths of hands as an input and add devices or virtual tools to empower the user to be more successful. |
Constraints to improve usability | Hands are practically unlimited in terms of how they move and the poses they can form. This presents a world of opportunities, so pay special attention to the included guidelines that navigate limitations. For example: components like the hand UI, have recommended placements for the UI at certain areas on the hand. These areas are recommended because hand tracking can have difficulty if one hand occludes the other hand too much. These limitations help increase accuracy, and actually make it easier to navigate the system or complete an interaction. |