Develop

Use ovrgpuprofiler for GPU Profiling

ovrgpuprofiler is a performance monitoring CLI tool for Meta Quest headsets that developers can use to access a range of real-time GPU metrics and perform render stage traces. ovrgpuprofiler is included with the Meta Quest runtime and does not need to be manually installed.

Use ovrgpuprofiler to Retrieve Real-time Metrics

Open a shell via ADB on a connected Meta Quest before using ovrgpuprofiler. If not using a shell, precede all commands in this topic with adb shell <command>.

Get Metrics List

To list all supported real-time metrics and their ID number, enter the following from the command line when a Meta Quest is connected via ADB:

ovrgpuprofiler -m

The beginning of the output for this command looks like the following:

    47 metrics supported:
    1       Clocks / Second
    2       GPU % Bus Busy
    3       % Vertex Fetch Stall
    4       % Texture Fetch Stall
    5       L1 Texture Cache Miss Per Pixel
    6       % Texture L1 Miss
    7       % Texture L2 Miss
    8       % Stalled on System Memory
    9       Pre-clipped Polygons/Second
    10      % Prims Trivially Rejected
    11      % Prims Clipped

Note: The metric count shown in this example output (47) is specific to the device and runtime version used when the output was captured. The actual number of supported metrics varies by device and runtime version.

Alternatively, ovrgpuprofiler -m -v provides the same list with more verbose descriptions for each metric.

Get Metric Data

To retrieve data for a metric, the command takes the following format:

ovrgpuprofiler -r<metric ID number>

For example, to retrieve the metric Texture Fetch Stall (ID number 4), enter ovrgpuprofiler -r4 and the console prints data every second until you press Ctrl-C.

Get Data for Multiple Metrics

You can also request multiple metrics at once by separating ID numbers with commas in a string, such as ovrgpuprofiler -r"4,5,6". The following shows output from ovrgpuprofiler -r"4,5,6":

$ ovrgpuprofiler -r"4,5,6"
% Texture Fetch Stall                      :           2.449
L1 Texture Cache Miss Per Pixel            :           0.124
% Texture L1 Miss                          :          20.338

% Texture Fetch Stall                      :           2.369
L1 Texture Cache Miss Per Pixel            :           0.122
% Texture L1 Miss                          :          20.130

% Texture Fetch Stall                      :           2.580
L1 Texture Cache Miss Per Pixel            :           0.127
% Texture L1 Miss
...

Note: Do not request more than 30 real-time metrics simultaneously.

Use ovrgpuprofiler for Render Stage Tracing

ovrgpuprofiler supports render stage GPU tracing on a tile-per-tile level. Unlike direct-mode GPUs, which execute draw calls sequentially, tile-based renderers batch draw calls for an entire surface. That surface is then split into tiles that are computed sequentially, where each tile executes all the draw calls that touched that tile. ovrgpuprofiler can tell you how much time was spent in each rendering stage for each surface rendered during a trace’s duration.

Prepare for Render Stage Tracing

Tracing on a tile-per-tile level requires the app’s GPU context to be in detailed GPU profiling mode. To set the OS to start subsequent apps in detailed GPU profiling mode, enter the following command:

ovrgpuprofiler -e

If an app is running when the command is entered, it must be restarted for the change to take effect.

ovrgpuprofiler -i shows whether detailed GPU profiling mode is currently enabled. Use ovrgpuprofiler -d to disable it.

In addition, apps being used with ovrgpuprofiler may require the <uses-permission android:name="android.permission.INTERNET" /> permission in their manifest.

Note: Detailed GPU profiling incurs an approximately 10% overhead in GPU rendering times. Keep this overhead in mind when reading trace output.

Note: The INTERNET permission requirement may vary by runtime version. Verify that this permission is required for your specific setup and target runtime.

Execute a Trace

To execute a 100 ms trace on the currently running app, enter the following:

ovrgpuprofiler -t

Trace length can be specified in seconds by including a number with the -t argument. For example, ovrgpuprofiler -t1.2 would run a trace for 1.2 seconds.

The output of the trace is printed to the console, listing the surfaces rendered during the trace along with render stage information.

Read a Trace

Lines from the trace output look like the following:

    Surface 1    | 1216x1344 | color 32bit, depth 24bit, stencil 0 bit, MSAA 4 | 60  128x224 bins | 5.08 ms | 130 stages :  Binning : 0.623ms Render : 1.877ms StoreColor : 0.309ms Blit : 0.002ms Preempt : 1.286ms

This shows that Surface 1 has a resolution of 1216x1344, 32-bit color, 24-bit depth, and uses MSAA 4. The surface was broken down into 60 tiles/bins with a size of 128x224, and it took 5.08 ms to render in total. There were 130 render stage executions in the process, and the remaining fields show how much time was spent in each render stage. Not every render stage appears for each surface.

Multiview Surface Output by Device

On Meta Quest, ovrgpuprofiler outputs one surface line per slice for multiview apps. This means that there is one surface for each eye. You must add the render times of two eye surfaces for the total frame time.

On Meta Quest 2, however, ovrgpuprofiler outputs one surface line for both views of the surface, due to how the Adreno 650v3 GPU processes multiview commands (Hardware Multiview). On Meta Quest 2, bins of multiview surfaces are shared between both views. Therefore, the following output:

135 96x176 bins

should be interpreted as:

135 96x176x2 bins

Meta Quest 3 and Meta Quest 3S both use the Adreno 740v3 GPU (Meta Quest 3 at 690 MHz, Meta Quest 3S at 492 MHz). Whether ovrgpuprofiler outputs one surface line per eye (as on Meta Quest) or one surface line for both views (as on Meta Quest 2 with Hardware Multiview) has not been verified for these devices. Refer to the release notes for your runtime version or contact developer support for confirmation.

Render stages that appear include the following:

Binning - The Meta Quest’s GPU uses a tiled architecture, meaning that all draw calls for a frame are executed in two stages. The first stage is the binning phase, where triangle vertex positions for all draw calls are calculated and assigned to bins that correspond to a partition of the drawing surface.

Render - This is the second stage of the draw call that began with binning. One chunk of this represents the total cost of all vertex and fragment operations for one bin. A simplified version of the vertex shaders is executed during binning for the purpose of finding a triangle’s position. The full version of the vertex shaders is re-executed to compute the interpolants used by the fragment shader during this stage.

LoadColor - Loads the color from slow memory into fast memory. This can happen when starting to render into a surface without clearing it.

StoreColor - After an entire bin of pixel and fragment operations are done executing, the calculated color value is copied from fast memory (dedicated for the bin’s rendering operations) to slow memory.

Blit - Copying between slow memory regions. This can happen for various operations, such as mipmap generation and when clearing a surface without rendering anything.

Preempt - The compositor is an OS-level service that executes at regular intervals to present the image submitted by the application to the screen. To deliver the image at the proper cadence, the GPU preempts the application’s workload so the compositor can complete its work on time.

Use ovrgpuprofiler for Per-Draw Metrics

In addition to per-surface render stage tracing, ovrgpuprofiler can sample GPU metrics on a per-draw-call basis within each surface. This is useful for identifying which draw calls dominate a frame’s GPU cost or for comparing the relative cost of individual draws.

Per-draw metrics use a separate metric catalog from real-time metrics — the available IDs and their descriptions are not the same.

Prepare for Per-Draw Tracing

Per-draw tracing requires the app’s GPU context to be in detailed GPU profiling mode. Follow the same setup as for render stage tracing by entering ovrgpuprofiler -e and restarting the target app.

List Per-Draw Metrics

To list the per-draw metrics supported on the connected device, enter:

ovrgpuprofiler -x -m

The output lists each metric ID along with its name. Pass -v for more verbose descriptions. As with real-time metrics, the available set varies by device and runtime version.

Capture a Per-Draw Trace

To capture a per-draw trace, combine the -x (draw call) flag with -t (trace), passing a comma-separated list of per-draw metric IDs to record:

ovrgpuprofiler -t3 -x="1,2,3,4,12,15,16,17,18,22,27,28,29,38,49,50"

In this example, -t3 runs a 3-second trace (the default trace length is 0.1 seconds) and the -x argument selects 16 per-draw metric IDs to sample.

Note: The number of metrics that can be requested in a single trace pass is limited by the GPU hardware. Requesting more metrics than the device supports per pass can cause some metric values to be missing from the output. If you need more metrics than fit in a single pass, run multiple traces with different metric subsets.

Read a Per-Draw Trace

The output groups results by process and command buffer. For each draw call, the trace prints a header line with the frame number, draw number, and an optional draw label, followed by the LRZ (low-resolution Z) state and the value of each requested metric:

Tracing for 3.0 seconds...
Process com.YourApp has 30 commandbuffers
    Frame 1   : Draw 1   Label 0xffffffff
        LRZ State: TestEnabled, WriteEnabled <0x03>
        Clocks                                     :       58912.000
        % Vertex Fetch Stall                       :           0.061
        % Shaders Busy                             :          39.907
        Fragments Shaded                           :           3.203
    Frame 1   : Draw 2   Label 0xffffffff
        LRZ State: TestEnabled, WriteEnabled <0x03>
        Clocks                                     :       44008.000
        % Vertex Fetch Stall                       :           0.000
        ...

Note: Per-draw timings are best used as a relative comparison between draws within the same trace. Per-draw measurement introduces pipeline stalls that affect absolute timings.

When you are finished capturing per-draw traces, disable detailed profiling mode with ovrgpuprofiler -d.

Command-Line Argument Reference

The following command-line arguments are available for ovrgpuprofiler:

Argument	Description
-r/--realtime	Prints the value of the real-time metrics every second. Accepts an optional comma-separated list of metrics IDs to track.
-m/--metrics	Prints the list of available real-time metrics IDs, their name, and their description. When combined with -x, lists the per-draw metric catalog instead.
-v/--verbose	Adds more detailed information to most other commands.
-e/--enable-detailed	Enables detailed profiling mode on the GPU driver; required for render stage tracing. Only applies to applications started after this mode is started.
-d/--disable-detailed	Disables detailed profiling mode on the GPU driver.
-i/--is-detailed	Queries if the GPU driver is in detailed profiling mode.
-t/--trace	Executes a render stage trace, with an optional trace length as argument in seconds.
-c/--continuous	If you specify this along with -t/--trace, the results of the render stage trace are polled periodically to reduce memory pressure.
-l/--low-overhead	If you specify this along with -t/--trace, the render stage trace is performed in low-overhead mode, which omits many details in exchange for a more accurate measurement.
-x/--drawcall	Switches the tool into per-draw mode. Combine with -m to list per-draw metrics, or with -t and an "=" comma-separated list of metric IDs to record a per-draw trace (for example, `-x="1,2,3"`).