Develop

Multi Object Detection sample overview

Updated: May 7, 2026

Overview

This sample demonstrates how to build a complete computer vision pipeline on Meta Quest using the Passthrough Camera API and Unity Inference Engine. It captures the passthrough camera feed, runs YOLOv9 machine learning inference to detect real-world objects, and places persistent 3D markers at detected object locations using depth raycasting and spatial anchors.

What you will learn

  • Integrating the Passthrough Camera API with Unity Inference Engine for real-time object detection on Quest
  • Converting 2D ML bounding boxes into 3D world-space positions using depth raycasting
  • Implementing Non-Maximum Suppression (NMS) in C# to filter overlapping detections
  • Using OVRSpatialAnchor to world-lock detected objects and recover from tracking loss
  • Building a coroutine-based async inference pipeline that avoids blocking the main thread

Requirements

You need a Meta Quest 3 or Quest 3S running Horizon OS v74 or later. For development environment setup and specific SDK versions, see the Passthrough Camera API getting started guide and sample README.

Get started

Clone or download the Unity-PassthroughCameraApiSamples repository from GitHub and open the project in Unity. Navigate to the MultiObjectDetection scene and build to your Quest device. The sample requires Scene permission (com.oculus.permission.USE_SCENE) for depth raycasting — see the repository README for detailed build instructions.

Explore the sample

The sample is organized into three major subsystems that work together to deliver the detection experience.
File / SceneWhat it demonstratesTopics covered
MultiObjectDetection.unity
Main scene with all components integrated
Scene composition, prefab orchestration
DetectionManager.cs
Orchestrates the overall detection flow, handles user input, manages marker spawning
State machine (NoPermission → Initial → Running), spatial anchor lifecycle, duplicate prevention
SentisInferenceRunManager.cs
Manages ML inference loop using Unity Inference Engine
YOLOv9 model loading, coroutine-based async inference, NMS implementation, tensor conversion
SentisInferenceUiManager.cs
Projects 2D bounding boxes into 3D world space
Viewport-to-ray conversion, depth raycasting, world-space bounding box calculation
EnvironmentRayCastSampleManager.cs
Wraps Depth API raycasting
Environment depth queries, ray-plane intersection
DetectionUiMenuManager.cs
Manages UI states and visual feedback
UI state transitions, typewriter text effect, blinking indicators
SentisModelEditorConverter.cs
Editor tool to quantize and optimize ONNX models
Uint8 quantization, model-level NMS baking
yolov9sentis.sentis
Pre-converted YOLOv9 model file
80 COCO object classes (person, bicycle, car, etc.)

Runtime behavior

When you run the scene, the sample first checks for required permissions. If Scene permission is missing, you see a NoPermission UI state. After granting permissions, the Initial state displays a welcome message with typewriter animation. Press the A button on your controller or perform an index finger pinch gesture to enter the Running state.
The inference loop begins, displaying real-time bounding boxes around detected objects with class labels overlaid in world space. Press A again or pinch with your index finger to spawn persistent 3D markers at the detected object locations — these appear with animated scaling and billboard labels. Press B or perform a middle finger pinch to clear all spawned markers. The UI continuously displays the model name, detection count, and total identified objects.

Key concepts

Camera-to-ML-to-3D pipeline

The sample implements a complete data flow from camera capture through ML inference to 3D world placement. PassthroughCameraAccess provides the camera frame and pose, which are converted to a tensor and fed to the YOLOv9 inference worker. The worker returns three output tensors — bounding boxes, class IDs, and confidence scores — which are then projected into world space. See SentisInferenceRunManager.cs for the complete inference loop.

Depth-augmented 2D-to-3D projection

Rather than placing bounding boxes at a fixed distance, the sample uses the Depth API to determine precise 3D positions. For each detection, the sample converts the bounding box center to a world-space ray via PassthroughCameraAccess.ViewportPointToRay() and raycasts against the environment depth map. This produces accurate world-space positions that align with the physical objects the user sees. See SentisInferenceUiManager.cs for the projection math.

Coroutine-based async inference

The sample uses Unity coroutines to avoid blocking the main thread during inference result retrieval. After scheduling inference with Worker.Schedule(), it calls ReadbackAndCloneAsync() and yields until completion, keeping the frame rate smooth. A PreloadModel() method runs a dummy inference pass at startup to avoid first-frame latency. See the RunInference() method in SentisInferenceRunManager.cs for the async pattern.

Spatial anchors for world persistence

The sample parents spawned markers to a GameObject with an OVRSpatialAnchor component, preventing content drift as the user moves. It saves the anchor after localization, continuously monitors tracking state, and automatically attempts recovery if tracking is lost. If recovery fails, the anchor is erased and all markers are cleared. See DetectionManager.cs for the anchor lifecycle logic.

Two-stage non-maximum suppression

The sample implements NMS twice: once baked into the model during editor conversion via SentisModelEditorConverter.cs, and again at runtime in C#. The runtime NonMaxSuppression() method filters overlapping bounding boxes by comparing Intersection-over-Union (IoU) scores against a configurable threshold. This two-stage approach balances model efficiency with runtime flexibility. See SentisInferenceRunManager.cs for the C# implementation.

Duplicate marker prevention

When spawning persistent markers, the sample prevents duplicates by checking both class name and spatial overlap. If an existing marker of the same class falls within the new bounding box, the duplicate is skipped. See the SpawnCurrentDetectedObjects() method in DetectionManager.cs for the deduplication logic.

Extend the sample

  • Custom object detection: Train YOLOv9 on your own dataset and convert the model using the editor converter tool in SentisModelEditorConverter.cs
  • Tune detection sensitivity: Adjust the NMS IoU threshold or confidence score filter in the SentisInferenceRunManager Inspector to balance precision and recall
  • Add haptic feedback: Trigger controller vibration when markers are spawned to provide tactile confirmation that an object was recognized
For a simpler introduction to the Passthrough Camera API, explore the CameraViewer or CameraToWorld samples first. For more on combining passthrough camera data with machine learning, see the Passthrough Camera API + Unity Inference Engine guide.