Scene understanding
What is scene understanding?
Use the physical environment as a canvas using scene understanding.
With scene understanding capabilities, surfaces from a physical environment can be used to make digital objects interact with them. These surfaces can be used for various use cases, including content placement, physics, and navigation. With Scene API, developers can use the scene model to bounce virtual balls off actual physical surfaces or have virtual robots scale physical walls.
In addition to the representation of the surfaces as 2D planes or 3D volumes, scene understanding provides semantic labels for the surfaces, such as the floor, ceiling, walls, desk, and couch. Using this semantic information, designers can create unique and exciting experiences. For example, open up the ceiling plane and replace it with a dark sky with the Milky Way and stars, or make a character walk on the floor and sit on the couch.
Figure 1: Semantic labels provided by scene understanding
What can we do with scene understanding?
Placing virtual content in the physical space
Using scene understanding’s identified planes, place objects on physical surfaces. For example, place a digital picture frame on a wall or place a game object on a table. With semantic surface type information, design more natural interactions with the environment by making objects only placeable on specific surface types, such as the floor.
When placing virtual objects in a realistic way, it is important to consider the physical properties expected of real objects, such as shadows and alignment. With these properties, virtual objects can become more believable and usable since the user doesn’t have to relearn different behavior.
The Mars Rover, in the left of the image below, is properly aligned with the table surface. It appears to be on a table, and it’s easy to estimate how far one needs to reach to interact with it. However, the object on the right is randomly placed in the space, misaligned with the surface, making it difficult to determine its location and distance.
Figure 2: Aligning a virtual object on a desk
When a virtual object is not properly aligned with a surface, it becomes difficult to understand its location and distance.
Using surfaces and volumes, occlude digital objects by using the real-world environment. This makes the digital objects more naturally blended with the physical environment. Occlusion and shadows allow apps to create convincingly real virtual content in the physical world, creating more immersive mixed reality experiences.
Without occlusion, virtual objects just appear as floating inside passthrough, or on top of the physical world, regardless of where physical objects are placed. Apply proper occlusion by masking parts of a virtual object that ought to appear behind passthrough objects.
Occlusion can be achieved with either scene understanding’s bounding box information or using Depth API which (Meta Quest 3 only) Depth API leverages the depth camera of the Meta Quest to gain depth information to seamlessly layer virtual objects in front or behind physical objects based on their distance from the camera.
Figure 3: The character on the left is behind the table and properly occluded by the table. The character on the right is rendered in front of the physical table regardless of its actual location. This can cause confusion with the actual distance, which makes it difficult to interact with the objects.
Video 1: The character is occluded by the physical environmental objects
Physics is another powerful element that can enrich immersive experiences. With scene understanding, use the physical surface, such as the wall or floor, as a collision surface. This means a virtual ball can bounce off the floor and wall.
Video 2: This example shows that a ball can bounce off a physical wall
Using the surfaces of the physical environment, make an object navigate around the environment. For example, make a character walk only on the floor just like the Oppy in The World Beyond app.
Video 3: This example demonstrates how virtual objects can interact with various types of physical surfaces
Or, make the character navigate on any surfaces including walls and tables.
Visualization of the physical surfaces
Scene understanding and passthrough allows us to build magical immersive experiences. In general, it is recommended to avoid or minimize the visualization of the physical surfaces since it will degrade the magic of virtual objects directly interacting with the physical environment. However, visualization of the surfaces in certain cases could be helpful for the users since it can provide context and improve the confidence that the device is aware of the environments.
In The World Beyond app, when the app is launched, it progressively reveals the user’s physical environment with dimmed, grayscale passthrough background and the animated white borders of the surfaces - walls, floors, ceilings, and furniture.
Video 4: This visualization provides the user confidence that the Meta Quest device is aware of the environment
Realistically placing objects with shadows
Shadows are an important element that makes objects feel grounded. Place a halo or a drop shadow on the floor to “pin” an object to a particular space in passthrough.
Shadows can not only improve realistic virtual objects, but it also can improve the usability and safety. Since shadows communicate the object’s position and depth more clearly, users can easily interact with them when touching, grabbing, and placing them.
The virtual objects with shadows feel more realistic and grounded:
Figure 1.4: Shadow placement
Figure 1.5: Virtual objects on a table with shadows
Leveraging semantic label for building rich experience
With semantic labels of the surfaces, we can build a rich experience by populating different virtual objects based on the surface types. For example, fill up the user’s floor with green grass and open up the ceilings with an infinite night sky with stars.