Develop
Develop
Select your platform

Unity Sentis For On-Device ML/CV Models

Updated: Nov 3, 2025
This section describes using the Unity Sentis framework with the Passthrough Camera API. Sentis provides a framework for loading models from common open source platforms and compiling them on-device. To learn more about Sentis, see the Sentis product page. This tutorial explains how to set up Sentis and the Yolov8n model to identify real objects at runtime on Quest 3 devices.
Passthrough camera using Unity Sentis
After completing this section, the developer should be able to:
  • Recompile and load the YOLO Sample with Unity.
  • Build a new project using Unity and Sentis.
  • Compile and build Sentis models to support models other than YOLO.
  • Integrate this API with the Unity Sentis architecture to access ML/CV models. Note that the Unity Sentis integration is not a required dependency and many developers will choose to access different or proprietary frameworks. However, it is provided as a convenience for developers to experiment quickly.

Use cases

The framework described in this section can be followed to load the ML/CV model of your choice. Note that the complexity of the models available online differ widely and you will need to be careful to find a model that meets the performance profile required by your application. Unity also provides samples that can be modified for use to do other things like digit recognition.

Prerequisites

  • Horizon OS: v74
  • Devices: Quest 3 or Quest 3S
  • Grant Camera and Spatial data permissions to your app. The Spatial data permission is used by the EnvironmentRaycastManager.
  • Install Unity 2022.3.58f1 with Sentis package 2.1.1 (com.unity.sentis)

Known issues

  1. The model accuracy is not 100% because the model has been optimized to work on devices like Quest 3.
  2. This model has been trained with 80 classes (not objects), it means that some object will be included inside a class, for example: Monitor and TV are in the same class TV_Monitor. The table in the next section identifies the classes the model can identify.
  3. Some classes are hard to identify, for example, cell phones are difficult to identify, in most cases are identified as the TV_Monitor class.
  4. Quest 3 controllers might be unrecognized or be mislabeled as remote controllers.
  5. The bounding boxes visualization in the MultiObjectDetection sample don’t perfectly align with the detected objects. For a better example of the camera to world projection, refer to the CameraToWorld sample. We will update this sample in a future version.

YOLO Sample

This sample draws 2D boxes around the detected objects and spawns a marker in the approximate 3D position of each object when the user presses the A button.

Sample details

  • Name: MultiObject Detection Sample
  • Unity version: Unity 2022.3.58f1
  • Unity Sentis: 2.1.1
  • MRUK version: 74.0.0 or higher

Build instructions

  1. Clone the Unity-PassthroughCameraApiSamples repository from GitHub.
  2. Open the Unity-PassthroughCameraApiSamples in Unity Editor
  3. In Build Profiles switch the Active Platform to Android
  4. Open the sample scene: Assets/PassthroughCameraApiSamples/MultiObjectDetection/MultiObjectDetection.unity
  5. In Meta->Tools->Project Setup Tool if the rule “MR Utility Kit recommends Scene Support to be set to Required”, choose “...”-> Ignore. Fix and apply all the other issues and recommendations
  6. Build the app and test it on your headset.

Preview the scene

  • Description: This sample shows how to identify multiple objects using Sentis and a pretrained YOLO model.
  • Controls: this sample uses the Quest 3 controllers:
    • Menus (Start and Pause):
      • Button A: start playing
    • In Game:
      • Button A: place a marker in the world position for each detected object
    • At any moment:
      • Button MENU: back to Samples selection.
  • How to play:
    • Start the application and look around you.
    • When an object is detected, you will see 2D floating boxes around the detected objects
    • If you press the A button, a 3D marker will be placed in the real world position of the detected objects with the class name
    • This model can identify the following objects (80):
person
fire hydrant
elephant
skis
wine glass
broccoli
dining table
toaster
bicycle
stop sign
bear
snowboard
cup
carrot
toilet
sink
car
parking meter
zebra
sports ball
fork
hot dog
tv monitor
refrigerator
motorbike
bench
giraffe
kite
knife
pizza
laptop
book
aeroplane
bird
backpack
baseball bat
spoon
donut
mouse
clock
bus
cat
umbrella
baseball glove
bowl
cake
remote
vase
train
dog
handbag
skateboard
banana
chair
keyboard
scissors
truck
horse
tie
surfboard
apple
sofa
cell phone
teddy bear
boat
sheep
suitcase
tennis racket
sandwich
potted plant
microwave
hair drier
traffic light
cow
frisbee
bottle
orange
bed
oven
toothbrush

Unity project structure

This section describes the components used in the MultiObjectDectection.unity scene.
MultiObjectDetection project structure
The sample scene contains all the prefabs to run the gameplay. Below we are going to see it one by one:
  • [BuildingBlock] Camera Rig: we created this object using Meta XR Building Blocks tool. This gameObject contains additional prefabs:
    • As child of CenterEyeAnchor:
      • DetectionUiMenuPrefab: manages and shows the ui of the sample.
  • [BuildingBlock] Passthrough: Meta XR Building Blocks entity to configure and enable the Passthrough feature.
  • DetectionManagerPrefab: contains the scanner logic to get the camera data and run the Sentis inference to update the UI elements.
  • SentisObjectsDetectedUiPrefab: contains the UI canvas to draw the boxes around the detected objects.
  • SentisInferenceManagerPrefab: contains the multi-object detection inference and UI logic.
  • EnvironmentRaycastPrefab: contains the logic to use the MRUK Raycast to get the real world 3D point using the Quest Depth Data.
  • PassthroughCameraAccessPrefab: creates a PassthroughCameraAccess component and manages its settings.
  • ReturnToStartScene: common prefab used to go back to Samples selection.

Sample Detection Manager Prefab

The main prefab that manages the logic of this sample is DetectionManagerPrefab.
DetectionManagerPrefab in the project hierarchy
This prefab contains the following components:
  • DetectionManager.cs: This script contains the sample logic. It gets the camera image from the PassthroughCameraAccessPrefab to send it to Sentis inference. Also, it manages the placement action when the user presses the A button.
Detection Manager script

AI models used on this project

This project uses Unity Sentis (2.1.1) to run AI models locally on Quest 3 devices. Below you can find the details about each AI model used with Sentis and which type of data we use and we get from each model.

Multi-object detection model

To identify real world objects, we are using the Yolov9 model prepared for Sentis with an extra layer (Non-Max suppression) and quantized to Uint8. This model requires just an image as input and returns the 2D coordinates and the class type of the detected objects.

Technical information

  • Model name: Yolov9t
  • Model format: SENTIS Quantized Uint
  • Model size: 6,284 KB
  • Classes trained: 80 different classes

YOLO Control Scripts

SentisInferenceRunManager.cs: contains the logic to run Sentis inference using Sentis 2.1.1.
Sentis inference run manager
This component has the following params:
  • Backend: the local backed used by Sentis to run the inferences
  • Sentis Model: the model asset (.sentis) used for this inferences
  • K_Layers Per Frame: the amount of layers executed per frame.
  • Label Asset: file a list of objects detected by pretrained Yolov8n model.
  • UIInference: references to the SentisInferenceUiManager scripts.
SentisInferenceUiManager.cs: contains the logic to draw the UI box around the tracked object.
Sentis inference UI manager
This component has the following params:
  • Display Image: the reference to the UI element that will contain the 2D bounding boxes.
  • Box Texture: the texture used by the 2d Box.
  • Box Color: the color of the bounding box.
  • Font: the font type used to write the detected object information.
  • Font Color: the color for the the object information
  • Font size: the size of the font used to show the object information.
  • On Object Detected event: event launched when YOLO detect an object,
Input values:TensorFloat with the current Camera RGB raw data image.
  • Sentis framework provides a function to convert a texture to tensor float.
Output values: the model returns 2 tensors:
  • TensorFloat of screen coordinates (X,Y) for all detected objects.
  • TensorInt with the class ID for each object detected
Sentis object detection in passthrough
Once we have the object and its coordinates (X and Y percentage of the input image), we use the Depth placement with MRUK Raycast to get the “real world” position.

ONNX to SENTIS converter

This sample includes an editor code to convert the Yolov9 ONNX model to Sentis format.
The Sentis Inference Run Manager component offers you the ability to adjust the Iou and Score thresholds for the YOLO model and generate a .sentis file with the Non-max-Supression layer and quantized to UInt8.
ONNX to SENTIS converter
  • Iou Threshold: YOLO Iou threshold used to identify the objects.
  • Score Threshold: YOLO score threshold value used to identify the objects.
The Sentis functions to quantize a model and save it are located inside the SentisModelEditorConverter.cs script.
. . .
ModelQuantizer.QuantizeWeights(QuantizationType.Uint8, ref m_model);
ModelWriter.Save(FILEPATH, m_model);
. . .

Depth placement with MRUK Raycast

To place a marker in the real world position using the 2D coordinates for each object detected we are using the MRUK Environment Raycasting (Beta) feature.This feature allows us to perform a raycast from the 2D position to the real world using the Quest 3 depth map data.
The EnvironmentRaycast prefab contains the two components used to perform the depth raycast.
EnvironmentRaycast prefab
  • MRUKCastManager class: this component contains the logic to perform the MRUK DepthRaycast using the ui position of each object detected.
  • Environment Raycast Manager: this script is provided by the MRUK and contains the logic to perform the RayCast to the internal depth map generated by the Quest 3 sensors.

MRUK Ray Cast Manager class

The MRUKRayCastManager.cs is the class that contains the function to get the Transform data of the real world point using the 2D coordinates from the detected objects in the camera image.
public Transform PlaceGameObject(Vector3 cameraPosition)
{
    this.transform.position = Camera.position;
    this.transform.LookAt(cameraPosition);
    var ray = new Ray(Camera.position, this.transform.forward);
    if (RaycastManager.Raycast(ray, out EnvironmentRaycastHit hitInfo))
    {
        this.transform.SetPositionAndRotation(
            hitInfo.point,
            Quaternion.LookRotation(hitInfo.normal, Vector3.up));
    }

    return this.transform;
}
The PlaceGameObject function performs a Raycast from the camera position (center eye anchor) to the world position of each ui box drawn around the object detected by the Yolov8n model.
This class uses EnvironmentRaycastManager from Meta.XR.MRUtilityKit package to use the Ray function.

How to generate your own YOLO ONNX model

The YOLO model used in this sample is the yolov9-t-converted.pt model from the MultimediaTechLab/YOLO GitHub repo converted to ONNX.
If you want to export any YOLO model, follow the steps below to generate the ONNX file:
  • Open a console in your Windows or MAC devices
  • Create a conda environment (Install conda first if you don’t have it)
    # Create Conda environment
    conda create -n "myenv" python=3.10
    # Activate conda environment
    conda activate myenv
    
  • Install pytorch
    # Install pytorch
    pip3 install torch torchvision torchaudio
    
  • Install Ultralytics package
    # Install the ultralytics package from PyPI
    pip install ultralytics
    
  • Create a new python script (yolo.py) and add the following code inside
    from ultralytics import YOLO
    # Load a pretrained YOLO model (recommended for training)
    model = YOLO("yolov8n.pt")
    # Train the model using the 'coco8.yaml' dataset for 3 epochs
    results = model.train(data="coco8.yaml", epochs=3)
    # Evaluate the model's performance on the validation set
    results = model.val()
    # Export the model to ONNX format
    success = model.export(format="onnx")
    
  • Run the python script to generate the Yolov8n.onnx file.
    python yolo.py
    
  • This script will show you the path of your onnx file when it finishes:
    // ending yolo.py output sample
    . . .
    ONNX: starting export with onnx 1.17.0 opset 19...
    ONNX: slimming with onnxslim 0.1.39...
    ONNX: export success ✅ 0.8s, saved as 'runs/detect/train/weights/best.onnx' (12.2 MB)
    Export complete (1.0s)
    Results saved to /Users/username/yolo/runs/detect/train/weights
    Predict:         yolo predict task=detect model=runs/detect/train/weights/best.onnx imgsz=640
    Validate:        yolo val task=detect model=runs/detect/train/weights/best.onnx imgsz=640 data=/opt/anaconda3/envs/Executorch/lib/python3.10/site-packages/ultralytics/cfg/datasets/coco8.yaml
    Visualize:       https://netron.app
    . . .
    

Building a new project with Sentis

This section describes the steps necessary to build your own project using Unity and Sentis for use with the Passthrough Camera API.
  1. Download the Unity Passthrough Camera API samples repository and copy this folder into a new project: Assets/Samples/PassthroughCamera.
  2. Use Unity Package Manager UI to install Sentis 2.1.1 into your project (com.unity.sentis).
  3. Run Meta->Tools->Project Setup Tool and fix any issues that it finds in the configuration of your project.
  4. Create a new empty scene and ensure it doesn’t have any Camera component.
  5. Add “Event System” to your scene.
  6. Use Meta ->Tools->Building Blocks to add Camera Rig.
  7. To run a different AI model, you can follow a similar structure from our Sentis sample:
    1. Check the content from the Assets/PassthroughCameraApiSamples/MultiObjectDetection folder.
    2. Drag the following prefabs into the project hierarchy:
      1. DetectionManager
      2. SentisObjectsDetectedUIPrefab
      3. SentisInferenceManagerPrefab
      4. EnvironmentRaycastPrefab
      5. PassthroughCameraApiPrefab
    3. Drag the DetectionUIMenuManager prefab under the CenterEyeAnchor and drag the CenterEyeEyeAnchor camera to the event Camera for the Canvas.
    4. Iterate through the prefabs and ensure the fields are populated as in the sample package
    5. Build and run to verify it works.
  8. The management and processing of the Sentis framework are managed with the following scripts:
    1. DetectionManagerPrefab See Sample Detection Manager Prefab in the previous section.
      1. Within DetectionManager.cs, the Update function is used to get the CPU texture data from the WebCamTexture object (PassthroughCameraApiPrefab reference) and kick off the inference model by calling RunInference(captureBuffer) once the WebCamTexture object is ready. This process checks for a camera texture and waits for the current inference to complete before starting a new one. This is a standard approach that you can re-use with ML models.
    2. SentisInferenceManagerPrefab contains the critical features for running Sentis and can be used as a design pattern for your Sentis models. See YOLO Control Scripts in the previous section.
      1. The public variables here allow you to configure the backend for Sentis to run on either the GPU or the CPU. Select the Sentis model that you plan to use (Yolov8n in our case). See How to generate your own YOLO ONNX model for background on how to create these models. ONNX is the standard format accepted by Sentis. Finally, it allows you to set the layers per frame.
      2. SentisInferenceRunManager.cs controls the Sentis model. This includes LoadModel() which takes the Sentis model including compiling the graph for the model and spawning the Worker thread that will execute the mode. Most importantly, it contains a public function RunInference(Texture2D targetTexture) function which converts the input camera image into the Tensor that can be consumed by the model and kicks off the inference.
    3. Modify the Sentis inference based on your model requirements:
      1. Input data will be the same (texture to tensor). RunInference(Texture2D targetTexture) function converts the texture to a tensor.
      2. Output data will be different, so you will need to write your own post process scripts. GetInferencesResults() function shows you how to get the output of the model using the pull request technique.
      3. We recommend the layer by layer inference technique and pull request download function to read the output to get better performance results on Quest. InferenceUpdate() function shows how to do the layer by layer inference.
  9. The UI will likely be very different for the model that you might choose, but the following is an explanation of what we used in our sample.
    1. SentisInferenceUiManager.cs is the script that updates the UI based on the output of the model. You will need to create your specific script to interpret the output of the model that you choose for your own application. In the case of YOLO, we use the DrawUIBoxes(output, labelDs, imageWidth, imageHeight).
    2. DetectionUIMenuManager prefab is under the CenterEyeAnchor and manages the display of the initial and start panels that you see as the app launches along with the countdown.

Recommendations for using Sentis on Meta Quest devices

  • Model architecture:
    • Use models that do not require complex architectures. Large generative models and LLMs will not perform well on devices like Meta Quest.
    • Sentis runs on the main thread of your Unity app, so it will impact other main processes like render pipeline.
      • Use layer by layer inference technique (split the inference in layer per frame) to not block the main thread.
  • Model size:
    • The model is loaded to the CPU or the GPU of your device. This can cause multiple issues when you try to use big models:
      • Long loading times
      • Lag on the main thread during the first inference
      • Reduction of the memory budget for other resources
    • We recommend:
      • The smallest version of the model that you want to use. For example, in the MultiObject detection sample, the smallest version of the model is YOLO (8mb), because the medium size of YOLO (146 MB) performs worse.
      • Convert the model to Sentis format and quantize it to Uint8 to reduce the loading times.
  • GPU versus CPU:
    • It’s important to choose the right strategy to minimize transferring data between GPU and CPU contexts:
      • If you are using an AI model to perform graphics-related work, keep all the processing on the GPU. Select the GPUCompute backend for Sentis and use the procedures that operate on the GPU to send camera data directly to Sentis. Finally, if you don’t need to access the output of the model on the CPU, you can send it directly to your shader or material.
      • If you need to send the output to the CPU, then you can:
        • Run the model on the GPU, then send results asynchronously to the CPU.
        • Run the model on the CPU.
      • We recommend getting the output data asynchronously in any backend to prevent blocking the main thread. Sentis provides different techniques to do accomplish this. To learn more, see Read output from a model asynchronously.
  • NPU backend on Sentis:
    • Right now, Sentis is not using any NPU or hardware acceleration to run the inferences. Sentis is part of Unity Engine and is designed to run on multiple platforms. For Quest devices Sentis is running as a regular Android platform with no specific accelerations. It is important to have this in mind when you are selecting a model.
Did you find this page helpful?
Thumbs up icon
Thumbs down icon