Our experience in the world is multimodal. As we interact with the world, we use our eyes and ears to experience our surroundings. We grasp or manipulate objects with our hands, move through spaces, and even use tools and devices to perform tasks more efficiently. Similarly, interaction in mixed reality is also multimodal, where moment to moment, a user may choose to use different inputs or tools to accomplish a task in the way that feels best.
Multi-modal input support in Horizon OS
Horizon OS supports various input sources, including spatial inputs like controllers and styluses, traditional hardware such as gamepads, keyboards, and mice, and even a user’s hands. Each input has its strengths and weaknesses, and the system aims to switch between them seamlessly.
In particular, controllers and hands are considered primary input methods, meaning the system prioritizes them throughout.
For Horizon OS, our primary input modalities are raycasting and direct touch. We also support undirected input via keyboard or controller buttons, and cursor or directional input using mice or controller sticks. The system dynamically switches between these sources and modalities as needed, to ensure users can accomplish tasks consistently and efficiently.
Input switching
Input handoff in Horizon OS is designed to provide a seamless transition between different input methods, such as hand tracking and controller input.
Switching between hand tracking and controllers depends on the experience or application. For example, you might start with hand tracking and then switch to a controller for more precision or when an action requires a trigger or thumbstick input.
Input devices generally take precedence over the human body. Picking up a device shows the user’s intention to engage with a specific input modality. For example, when a user picks up controllers or a mouse, hand tracking is automatically disabled, prioritizing device input.
Default inputs: hands and controllers
Hand tracking: On Meta Quest, hand tracking is enabled by default when no controllers are used.
Controller pickup: When a user picks up a controller, the system seamlessly detects this and automatically switches from hand tracking to controller tracking for that specific hand. This design assumes that the user intends to use the controller as their primary input method. Meta Horizon OS supports concurrent tracking of both hands and controllers, allowing the users to use hand tracking with their other hand.
Returning to Hand tracking: If the user puts down the controller and it’s not in use for a short while, the system can automatically switch back to hand tracking. This transition back to hand tracking is also automatic, but there may be a slight delay as the system re-establishes the hand tracking capability.
Other inputs
Mouse
When a user moves a Bluetooth-connected mouse, the mouse pointer appears. This pointer conflicts with other targeting mechanisms like the controller ray or hand ray, so those input methods deactivate automatically.
Keyboard
When the user starts typing keys on a Bluetooth-connected keyboard, other input modalities are automatically deactivated.
Gamepad
When the user starts pressing buttons on a Bluetooth-connected gamepad, other input modalities are automatically deactivated. Optionally, Horizon OS supports headset gaze input and cursor for targeting which can be used with Gamepad’s buttons for selection.
Stylus
In Horizon OS, stylus input functions similarly to controller input. Users can target objects using the ray and cursor emitted from the stylus tip, and select items with a button press. Additionally, the stylus can be used in conjunction with a controller.
Interaction distance
In mixed reality environments, users can move freely and interact with various objects in their physical environment. To accommodate this flexibility, it’s best to design experiences that support both near-field and far-field interactions, enabling users to select their preferred input method and distance based on the context and their personal preference.
Horizon OS supports both near-field (direct) and far-field (indirect) interactions by dynamically showing and hiding proper visual affordances based on the distance of a user’s hands from the target objects or UI elements.
Video 1: visual affordances based on the distance
Hand distance to the surface is a strong signal for interaction intent. When virtual objects or UI elements are within an arm’s reach, a user can directly interact with them similar to how they would interact with objects in the physical world. For example, they might reach out and grab, push, or poke objects that are close to them.
Far-field interactions involve interacting with objects or UI elements that are beyond arm’s reach, typically targeting with raycasting and selecting with controller buttons or pinch gestures.
Video 2: Interaction distance
Hybrid input
Hand and controller
In some scenarios, you can use a controller in one hand and keep the other hand free for hand tracking. This setup is useful when you need precise input from a controller, such as pointing or selecting objects, while still using hand gestures like swiping, pinching, or grabbing with the other hand.
The Meta Quest system can detect when a user is using a controller in one hand and when the other hand is free for hand tracking. The system will automatically blend the inputs, allowing users to switch between gestures and controller actions without interruption.
Poke with hand while holding controllers
Horizon OS allows users to interact with the UI using their hands while holding controllers. this enables users to naturally interact directly with the target close to them.
Stylus and controller
Stylus input can be used in conjunction with a controller. This combination enables creative applications that leverage sophisticated tools and menus on the controller while simultaneously allowing users to draw and sketch with the stylus.
Best practices
Ensure usable target sizes for the primary forms of input
Avoid providing multiple input sources with the same capability simultaneously
Optimize target size and visual feedback for the primary input method’s precision level and interaction distance
Input ranking may vary depending on task and content type
Direct touch is not a ‘mode’ and should be always available
Avoid artificial distinction between near-field and far-field interaction