Last year we launched RenderDoc for Oculus, our custom fork of the popular graphics debugger, RenderDoc. This Quest specific tool provides access to low-level GPU profiling data, specifically information from its tile renderer enabling you to get actionable profiling information from your Quest. We continued to improve the tool by adding many new features such as the Tile Browser and Oculus Performance Counters. The deep profiling features and a variety of graphics debugging capability makes RenderDoc for Oculus an important tool for solving difficult performance problems. In this blog post, we review how to use RenderDoc for Oculus effectively, as well as some pitfalls you should avoid.
Getting Started
RenderDoc for Oculus is a graphics debugger that allows quick and easy single-frame capture and detailed introspection of any application (
Download Now). If you are new to RenderDoc itself, we recommend reading the following posts to familiarize yourself with the tool.
For those who are familiar with RenderDoc for Oculus, we will now dive into how to get the most out of the tool.
Timer button behavior
The duration timer button is one of the most popular features of RenderDoc because of its simplicity and ease of use. Since v23.2, we have changed the behavior of
the timer button to do a renderpass measurement rather than a drawcall measurement. The reason for this change is because of performance query - the mechanism RenderDoc uses to measure per-draw call duration doesn’t work on mobile platforms due to mobile GPU tiled architecture. If you still prefer the old behavior, you can change Settings > Profiling > Timer query type back to drawcall.
Optimization level
When you launch an app through RenderDoc, it replaces your app’s API calls with its own in order to record all the commands that were issued to produce a frame. In order to extract the resources and states to present in the UI, RenderDoc sometimes has to insert or alter commands by necessity. As a result, RenderDoc’s playback can sometimes contain phantom operations. To minimize this problem during profiling, we force
optimisation level to fastest when you select the profiling mode replay context. As a side effect, some resources will appear black in profiling mode. If you need to inspect these resources and profile at the same time, you can open the capture using mainline RenderDoc, and at the same time open the same capture file using RenderDoc for Oculus with the profiling mode replay context. If your capture is small enough to fit two copies in memory at the same time, you’ll be able to profile and inspect resources at the same time.
Since we force this setting to “Fastest” when using profiling mode replay context, we recommend that you keep this setting at “Balanced” for convenient resource inspection when using non-profiling replay context.
GLES/Vulkan API Errors
Before starting any performance investigations, you should check your app for API errors, as all of our performance tools expect valid command streams. An invalid command stream or malformed commands will cause API errors, which can result in undefined behavior such as crashing or poor performance. Since v29, we have included Vulkan Validation Layer support with our OS release; you can use them through the command line or with RenderDoc.
RenderDoc API Validation
To check your app for API errors, enable API validation when opening a capture for replay. Note that we have disabled the ability to enable API validation with the profiling mode replay context. This is because we want to minimize overhead as much as possible when profiling using RenderDoc for Oculus.
After opening your capture with API validation enabled, all API errors RenderDoc detect will show up under the errors and warnings panel.
Tile Timeline and Tile Browser
Quest and Quest 2’s tiled architecture GPU optimize memory bandwidth usage by rendering in tiles - each tile can be temporarily stored and accessed multiple times through the GPU’s fast memory, and only need to write out to slower memory once. Therefore it is very important for mobile VR apps to take advantage of this optimization.
RenderDoc for Oculus’ Tile Timeline feature shows the different stages Qualcomm’s GPU goes through when rendering a scene.
You might notice that some of the bins are bigger than others, this is due to tile packing where tiles corresponding to peripheral vision are merged and rendered in a lower resolution. Also, the heatmap overlay may over extend the dimensions of the framebuffer, this is due to the framebuffer dimension not aligning with the bin dimension. These partially occupied edge bins use custom scissor tests to eliminate workload on out-of-bound pixels.
Additionally, you may notice that the reported frametime from the Tile Timeline might be much more expensive than the frame time reported by the duration timer or other frame time measuring method. This is because the act of gathering timing information and attributes introduces overhead, and with the large amount of render stages, the overheads can add up significantly. Also, Quest 2’s resolve stages (such as color write or depth write) can overlap with the render operation. This ability to perform render concurrently with resolve creates a non trivial amount of overhead and can throw off our performance measurements.
Phantom attributes
As mentioned previously, RenderDoc sometimes alters operations during playback by necessity. This can sometimes affect the timing measurements and it is important to double check the playback commands and make sure they match what your app is doing.
Above shows two sets of surface attributes from the same UE4 Vulkan app. On the left is the surface attribute from a Perfetto capture and on the right is the surface attribute from a RenderDoc for Oculus capture. The color attachment attributes on the left shows “Tiled|UBWC” while the right shows 0. UBWC is Qualcomm’s
Universal Bandwidth Compression which can help with resolve speed. It is not a Vulkan/GL attribute that you can set for your framebuffer, it is chosen if your framebuffer configuration satisfies the requirements for UBWC. One of the requirements for a UBWC framebuffer attachment is that the image must not be created with the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT flag, which RenderDoc inserts in order to extract MSAA contents. This is a fixable problem, but will take some time to do so.
What this highlights is that the measurements you get from RenderDoc for Oculus might not be an exact replay of your app, and you should take measurements from multiple sources for sanity check.
Oculus Performance Counters
There are currently 49 metrics available in Renderdoc for Oculus’ performance counters. You can check out our documentation on
all the metrics here. One of the most important metrics is the GPU clock metric. When RenderDoc for Oculus queries for the Oculus Performance Counters, the drawcalls are executed one by one without parallelism. For every cycle the GPU is performing an operation due to the drawcall, the GPU clock counter will be incremented by one. Drawcalls that occupy more bins will naturally have a larger GPU clock metric because bins are processed one at a time. The GPU clock metric is a measure of the drawcall latency rather than throughput. In a normal execution environment, the device will employ a variety of latency hiding techniques which can result in overlap of many operations. Therefore it is important to not over-index the GPU Clock metric as the latency hiding techniques can be just as crucial in achieving high performance for your app.
Percentage Time Metrics
The percentage time metrics can help you get a sense of a drawcall’s cost distribution between the vertex stage and the fragment stage. For compute shader dispatch, the %Time Compute metric should be near 100%. The total of these three metrics should add up to 100% and represents the total time spent in a drawcall.
Shader Invocation Metrics
After knowing whether a drawcall is vertex bound or fragment bound, we can check the shader invocation metrics to see if we need to optimize the shaders themselves. A high vertex shader cost can be due to high vertex count (Vertices Shaded), or the vertex shader itself (Vertex Instructions) being expensive. Similarly for fragment shaders, the cost can be due to the number of fragments (Fragments Shaded) or the shader complexity itself (Fragment Instructions).
Memory Metrics
Besides instruction count, poor memory locality can lower the effectiveness of caches and reduce performance. Memory metrics such as % Vertex Fetch Stall, % Texture Fetch Stall, L1 Texture Cache Miss Per Pixel, % Texture L1 Miss, % Texture L2 Miss, % Stalled on System Memory measures the negative effects caused by memory operations. These metrics measure the effects of memory operations on your drawcalls. To pinpoint the problem within your drawcall, you’ll need the next set of metrics.
Per-Shader Metrics
Per-Shader Metrics can tell you how much texture fetch, arithmetic operation, and elementary function operations your shaders are using. A high value of “Textures / Vertex” or “Textures / Fragment” metric along with high “Vertices Shaded” or “Fragments Shaded” metric can result in high “% Vertex Fetch Stall” or “% Texture Fetch Stall”. Note that
SSBO or R/W Buffer operations are considered as texture operations and can increase “Textures / Vertex” or “Textures / Fragments”. The “ALU / Vertex” and “ALU / Fragment” metric measures the amount of arithmetic operations your shader uses. Finally the “EFU / Vertex” and “EFU / Fragment” measure the amount of elementary function operations such as sine/cosine.
Shader Editing
Standard RenderDoc comes with the shader editing feature which we can leverage for quick shader experiments without rebuilding your app. To use this feature, make sure that your shader processing tool paths are pointing to a valid path. Under Settings>Shader Viewer, check your paths by clicking on Edit:
With the shader processing tool path set up correctly, you’ll be able to make changes to the shaders by navigating to the shader, click on Edit to open the Editing Shader Module panel.
After you are done making the changes, click on the Refresh button to compile the shader for the changes to take effect. Subsequent profiling measurement will reflect the changes made until you close the Editing Shader Module panel.
Demo
Here is a video walkthrough leveraging all the profiling features in Renderdoc for Oculus.
Closing
Creating an immersive and engaging VR experience requires a deep understanding of the platform. As we learn more about the capabilities of the Quest GPU, we will make upgrades and improvements to the toolsets to reflect the performance characteristics of the platform.