Showdown is a PC VR demo built by Epic Games with Unreal Engine 4 (UE4). First shown at Oculus Connect 2014, Showdown presents an action cinematic in bullet-time slow motion; it was originally released for the
Oculus Rift.
We converted the Showdown demo to run on Meta Quest 2 with the goal of maintaining the visual fidelity of the PC version. The finished project runs on Quest 2 at a consistent 90 FPS. To reach that mark, we had to significantly optimize CPU and GPU performance for the mobile renderer.
In
our previous post, we examined how Application SpaceWarp (AppSW) improved the performance of Showdown and demonstrated how to profile the impact of AppSW and other render settings.
In this article, we’ll review how we converted Showdown to Quest 2, our conversion process, how to identify performance bottlenecks, and describe specific optimizations to address CPU and GPU heavy apps. This post is intended as a reference for developers porting their own project to Quest. We may mention settings or optimizations that weren’t used in Showdown but might be useful to other developers.
You can find the Quest app on App Lab and the project source code in the
Oculus fork of UE4.27. We recommend VR developers stay on the Oculus fork of UE4.27 until our support for VR in UE5 is ready for production use. See our documentation on
how to gain access to the Oculus fork and what it includes. Showdown for Quest depends on the forked version of Unreal Engine as it includes AppSW and the mobile tonemap subpass. These features have not been upstreamed to Epic’s repo yet at the time of this post’s publication.
Please note that while the Showdown app does run on Quest 1, all profiling and performance data in this doc were collected on Quest 2.
Our Process
Starting with the PC VR project, our conversion involved several phases:
Get the project building and running on Quest
Disable performance intensive features
Measure baseline performance
Optimize the stripped down project
Optimize individual features as we re-enable them
Re-enable feature
Measure performance impact
Optimize as needed
Building and Running
After opening a copy of the PC project in the Unreal Editor and adding Android to the Supported Platforms, there are some other less obvious settings to change. A quick way to find many of these settings is to use the Oculus Performance Window.
[Project Settings > Plugins > OculusVR > Launch Oculus Performance Window]
Project settings
Disable Mobile HDR - Mobile HDR is not currently supported by Quest. Disabling HDR lightens the load on the GPU as it reduces the data processed per pixel.
[Project Settings > Engine > Rendering > VR > Mobile HDR]
Enable Multi-View - This allows more efficient rendering in VR by processing tiles for both left and right eyes together.
[Project Settings > Engine > Rendering > VR > Mobile Multi-View]
FFRDynamic - Adjust the foveation level automatically based on the GPU utilization. More info on FFR settings can be found
here.
[Project Settings > Plugins > OculusVR > Mobile]
Vulkan - We highly encourage using Vulkan on Quest. The Oculus fork of Unreal includes several Vulkan specific fixes and improvements. Vulkan is required for both AppSW and the tonemap subpass.
Additional details on setting up your project for Quest can be found
here.
Other changes we made to the project outside the scope of this document include:
Materials
Disable High Cost Features
Our next step was to get the project simplified as much as possible to get passable performance (at or above 60 FPS) on Quest. To do so we:
Hid any actors that weren’t essential to the overall experience. These included:
Disabled all particle emitters. They will be re-enabled one by one later in the process.
Some visual effects (window reflections, halo around robot eye, etc.) were too expensive and didn’t have a major contribution to the overall experience, so they were dismissed.
At this point we are able to package and run an apk on device though it still suffered from poor framerate and frequent hitches. If you encounter similar issues, don’t worry. We’ll address each of these, but first collect some performance metrics to measure improvement as changes are made.
Measuring Performance
The key to optimization is identifying the specific system resources that are overloaded. With that knowledge, you can target features and processes that are taxing those resources and optimize or eliminate them.
Each time you profile your app, ask these questions:
Is the app meeting my performance targets? Specifically is the frame duration meeting my target (e.g. 13.9 ms for 72 FPS or 11.1 ms for 90 FPS)
If not, is the app CPU or GPU bound?
If CPU bound, is the main thread or the render thread taking too long?
If GPU bound, is the application vertex heavy or fragment heavy?
Let's take a look at our process for determining bottlenecks and ways to fix them. We may describe several tools for examining each area. There is significant overlap between profiling tools and every developer has their own preferences, so we try to present multiple options.
CPU or GPU Bound?
When you need to determine whether your bottleneck is on the CPU or GPU, there are several ways to check which is pushing the framerate down.
Oculus Developer Hub (ODH) - Observe these stats in the Performance Analyzer tab. Learn more and download
here.
CPU > CPU Level & GPU > GPU Level - “If an app with performance issues is at CPU 4 and GPU 2, the app must be CPU bound since there is still available GPU overhead. However, if both levels are 4 and the app has issues, this number is not as useful, and other metrics should be used“
Timing > App GPU Time - Displays the app GPU time for a single frame. If the length of time shown is longer than a single frame’s length (11.1ms for 90 frames per second), the app is GPU bound.
GPU > GPU Utilization - If this value is maxed 100%, the app is GPU bound.
VrApi - You can find equivalent stats in logcat
VrApi output with “adb logcat -s VrApi”.
CPU/GPU Levels
App Time
GPU%
CPU Performance
When profiling the CPU you will see two main threads; the game thread and the render thread. If either thread takes more than the desired frame duration it will negatively impact your framerate. Let’s look at how to determine if these threads are taking too long and how we can optimize them.
Which thread is taking too long?
UE4 Session Frontend
Launch your app on device from the Unreal Editor
[Window > Developer Tools > Session Frontend]
Select the Console tab
On the left, expand “My Sessions” and “Launch On Device”
Select your Quest
At the bottom of the console tab enter “stat unit” and click “Send Command”
On device, view the values for Game (game thread) and Draw (render thread)
ADB
Alternatively, you can enable the same stats using this adb command:
adb shell "am broadcast -a android.intent.action.RUN -e cmd 'stat unit'"
From the overlay you can determine if the game or render thread is taking longer. In the screenshot above, the app is clearly constrained by the game thread. Next we will look at how to optimize both game and render threads.
Game Thread
The game thread handles all your game logic and physics simulations. It can get bogged down by lots of actors and complex blueprint logic.
Actors
Reduce Bones of Influence - Skeletal mesh ticks cost performance on the CPU. Reduce skeletal meshes to 2 bones of influence using the
Skeletal Mesh Reduction Plugin.
Hide Actors - Where possible, hide actors to reduce CPU processing. After the car flips over, it obscures the soldiers when looking backward. We hide the soldiers at this point as they aren’t visible anyway.
Blueprints
Increase Tick Interval - Some things don’t need to update every frame. Increase the tick interval where possible. The default value is zero and will tick every frame. Note that changes to tick interval can affect physics / collisions or even the look of visual effects, so be careful where you use this. If simply waiting on an action, disable ticking and use event handlers instead.
[Blueprint > Details > Actor Tick > Tick Interval (secs)]
Nativize Blueprints - If you have particularly complex blueprints, you can make them more efficient by converting to native C++.
Physics
Disable Collision - This won’t work for every scenario, but because Showdown is not interactive, we were able to save physics processing by disabling collision detection on the soldiers.
[Details > Collision > Collision Presets = NoCollision]
Particles
Epic provides helpful details about particle system optimization
here and
here.
Reduce Particle Count and Lifetime - “The more particles in a scene, and the longer they live, the more evaluation is required. Limiting lifespan to the duration required for the effect is good practice all around.”
Reduce Active Emitters - “Tick Time is directly influenced by the number of active EmitterActors in your scene. The more active emitters in the scene, the higher Tick Time will be. Emitters should only be set to autoActivate if they are required to loop when the level starts.”
Disable Collision - Collision modules are quite expensive and can have a noticeable effect on CPU performance. Consider disabling collision on particles where visually acceptable. In the particle editor, remove the collision module from each emitter.
Render Thread
The render thread issues draw calls to the GPU. Combining objects into a single draw call is more efficient even if there is no reduction in geometry data because of the overhead of preparing and issuing the draw call. The primary way to improve render thread performance is to reduce draw calls by combining actors and utilizing instancing.
tldr: Reduce draw calls
Actors
Merge Actors - Reduce draw calls by combining objects into a single actor. Keep in mind that as you merge objects, the combined bounding box increases in size. This larger bounding box can mean that the object is culled less often which results in more draw calls for both CPU and GPU to process. It’s important to profile your scene to ensure performance is improved when merging. In Showdown, we made extensive use of the Proxy Geometry Tool to merge buildings and other parts of the scene.
[Window > Developer Tools > Merge Actors]
Hide Actors - Hiding the soldiers obscured by the flipped car, as described above in the game thread section, also reduces our draw calls.
Instancing - Instancing allows you to render a batch of identical meshes in a single draw call. The vertices of the mesh are already available to the GPU. However, the GPU also needs a transform to render each occurrence of the mesh. Instancing stores those transforms in GPU memory so they can be batched into a single draw call.
Auto Instancing - This is the simplest approach. The engine will discover and batch duplicate meshes for you.
Instanced Static Mesh (ISM) - Instead of creating multiple static meshes in the scene, create a single empty actor, add the Instanced Static Mesh component, and add instances in the component properties. Note the instances are not individually culled and cannot be moved at runtime.
Hierarchical Instanced Static Mesh (HISM) - Similar to ISMs except with LOD and distance culling support at additional performance cost.
Epic provides more details on render thread optimization
here.
GPU Performance
Unlike other GPUs which have different compute units for vertex and fragment shaders, Qualcomm uses a
unified shader architecture. The same cores run vertex shaders and fragment shaders. You won’t encounter a situation where you have so much fragment work that vertex work can be done in parallel without cost.
That said, it is still useful to know which stage, fragment or vertex, is demanding the majority of your GPU time as different optimizations apply to each.
You can use ovrgpuprofiler to see how much time is spent on fragments versus vertices.
Another test is to render fewer pixels by shrinking the app’s render scale to something small, like 0.2. This will render fewer fragments, but retain scene complexity.
Run VrApi
Make a note of the App time
Reduce PixelDensity
Check VrApi App time again
If App time improves, the app is fragment heavy.
If App time does not improve, the app is vertex heavy (or potentially CPU bound).
Note: This method can also be used as a good indicator of GPU vs CPU bound as mobile VR apps are much more likely to be fragment bound than vertex bound. If app time improves with lowered pixel density, you should take another look at CPU utilization.
Vertex
Vertex heavy apps have issues with scene complexity. This is essentially too many triangles rendering at the same time.
tldr: Reduce triangles/vertices
Generate LODs - Reduce mesh detail when at a distance from the camera. Unreal provides an Automatic LOD tool to reduce the poly count of your static meshes without manual artist work. Use this for every mesh you can. For Showdown this includes cars, trash cans, the robot, and buildings. We found that the default distances weren’t triggering level changes, so we manually tweaked them. Our workflow for LODs was:
Open mesh in the Static or Skeletal mesh editor
Disable [LOD Settings > Auto Compute LOD Distances]
Increase [LOD Settings > Number of LODs]
to 2 creating LOD 1
Enable [View Mode > Level Of Detail Coloration > Mesh LOD Coloration]
View Mode normally appears as “Lit” in the top left of the scene view.
This can be changed in both the Mesh Editor and your main editor window.
Tweak [LOD 1 > Screen Size]
and confirm LODs are switching.
Mesh LOD Coloration
Reduce Mesh Complexity - For complex meshes, you may find that LOD 0 contains more triangles than are visually discernible. In this case, you can either reduce the Percent Triangles for the LOD, or increase the Minimum LOD to LOD 1.
[Mesh Editor > Details > LOD 0 > Reduction Settings > Percent Triangles]
[Mesh Editor > Details > LOD Settings > Minimum LOD]
Remove Geometry - You can remove portions of your meshes right inside the UE4 editor with
Static Mesh Editor. We manually stripped primitives out of meshes that would not be seen by the user (e.g., back side of buildings, unseen background “skyline” buildings, etc.).
Reduce Mesh Particles - Sprite particles are more efficient because they add fewer tris to the scene. Use mesh particles sparingly. Fewer meshes will reduce draw calls.
Fragment
Fragment heavy apps have issues with shader complexity. In effect, too much time is spent calculating individual pixels.
tldr: Simplify shaders
Textures / Materials
Merge and Bake - Create a texture from one or more materials to reduce shader complexity.
Merge Texture Sampling Nodes - Avoid resampling the same texture multiple times. This is a common inefficiency with artist-made shaders. Artists may add variations to their material by sampling the same textures multiple times with a scale and offset to the UV. This variation comes with very little cost in terms of making art, but the texture fetch is impossible to optimize out because the UVs are different.
Fully Rough - Use wherever possible to force the materials to be completely rough. This saves a number of material instructions and one sampler.
[Material Editor > Details > Material > Fully Rough]
Disable Lightmap Directionality - Disable this option to have flat, but less expensive, lights which don't use per-pixel normals.
[Material Editor > Details > Mobile > Use Lightmap Directionality]
LOD Bias - Some textures, even those on small objects like soda cans, were unnecessarily large. Adjusting the LOD Bias to the lowest acceptable level will minimize texture bandwidth consumption.
[Texture Editor > Details > Level Of Detail > LOD Bias]
Replace Complex Materials - For example, we replaced the costly bullet trail material with a much lower cost version.
Noise Node Levels - Reducing levels on the explosion materials reduced instruction counts substantially. In one case we reduced the value from 7 to 2.
[Material Editor > Noise node > Details > Levels]
Texture Samplers - Turn all texture samplers to use trilinear (or bilinear) filtering instead of aniso. Texture Groups allow setting this in configuration with no art adjustments. Minimal effort to update individual texture samplers back to aniso if something looks much worse (e.g. roads, sidewalks, brick walls).
Update config for Texture Group:
In Config/DefaultDeviceProfiles.ini, find [Oculus_Quest2 DeviceProfile].
For each “+TextureLODGroups”, change MinMagFilter to Linear instead of Aniso.
If you set MipFilter to point, it will use bilinear.
Update filter for individual texture:
[Texture Editor > Details > Texture > Filter]
Lighting
Dynamic Lights - We attempted to enable dynamic lights in Showdown (during explosions and gunfire), but this caused hitches due to increased fill rate when triggered. In the end, we chose to only use static lights.
Particles
GPU Particle Texture Resolution - Reduced size to 256x256. Ideally move particles to the CPU and avoid GPU work altogether, but the easier solution is to shrink the texture resolution to the size of max particle count.
[Project Settings > Engine > Rendering > Optimizations]
Use CPU instead of GPU - You can use GPU particles, but if you are GPU bound and have headroom on the CPU try moving them back. In the particle editor remove any “GPU Sprites” modules. To move them to the GPU again, right click on the emitter and choose [TypeData > New GPU Sprites]
. Profile both options.
Emitter Materials - In Showdown, emitter materials were addressed to reduce their complexity. We expected transparent materials would be expensive. Due to the small size of the particles on the screen, the total number of pixels they touch is low. They turned out to not cost much. Some materials relied on refraction. For these, we created new effects (such as the smoke blast effect during the car blast and the bullet trails).
Project Settings
Max Anisotropy - If you are using aniso, limiting the anisotropic sample count reduces texture bandwidth, but does not save texture memory. Set r.MaxAnisotropy in DefaultEngine.ini and DefaultScalabity.ini.
Pixel Density - If all else fails, you can reduce pixel density from 1.0 to 0.9. Modify vr.PixelDensity
using the “Execute Console Command” blueprint node. This is a last resort when the GPU is fragment bound and all other options have been exhausted. We recommend setting Pixel density dynamically so that it is only reduced in areas that are over taxing the GPU, and returned to 1.0 when manageable.
Broader Optimizations
Some optimizations provide significant performance improvements for both CPU and GPU. As these don’t fit neatly into the above categories, we will discuss them here.
Skeletal Mesh Culling - While analyzing the performance of skeletal meshes, we noticed the culling was still submitting draw calls for out-of-view meshes, which would deteriorate performance. Steps to fix this were:
Ensured the bounds were being updated as needed by including the component location in the bounds updates.
[Skeletal Mesh Component > Optimization > Component Use Fixed Skel Bounds = false]
Checked the bounds of infiltrators in the editor as these are used for the frustum culling tests. Enable [Show > Advanced > Bounds], then multi-select the relevant skeletal meshes.
The physics assets were considered the “root” of the skeleton to be part of the in-game bounds, which were set at the start of the animation sequence without moving. For example, the shooting animations where the infiltrator moves farther away would just expand the bounds as seen below.
After removing the root node from the bones to consider the bounds, we get a much tighter fit. Thus improving frustum culling drastically.
To remove the root:
Open the Skeletal Mesh editor by double clicking the asset
Select Physics in the upper right hand corner
Select the “root” under Skeleton Tree
Disable [Details > Body Setup > Consider for Bounds]
Before - Infiltrator bounding boxes extend back to root.
After - Tighter bounding boxes result in more efficient object culling
Framerate Hitches
tldr: Eliminate runtime shader compiling
PSO Cache - Collect shader data ahead of time and build it into your application to compile the shaders on startup. On the fly compilation can tax the CPU and cause temporary frame rate reduction or hitches.
There were very obvious hitches in Showdown on first playthrough that were not present the second time through the sequence. Unreal Insights showed the hitches as PSO Vulkan Creation. You can also see these hitches with OVR Metrics as dips in the FPS graph. When built with a properly generated PSO cache, the full sequence runs without hitches the first time through. This is especially important to VR applications as even small hitches are much more noticeable to the user than on a monitor or mobile device.
This Unreal Insights screenshot clearly shows the hitches caused by PSO creation.
See the
Unreal documentation for details on setting this up. You need to make a special build and play through your entire app to collect PSO data, so wait to do this until the project is nearly finished. Once configured, it is fairly simple to capture cache data again if needed.
The Showdown source includes GeneratePSOCache.bat to handle pulling data files from the headset, building the csv, and moving it to the project directory for packaging. You can modify the parameters of this script for use in your own project.
Conclusion
Now that you’ve seen how we optimized Showdown, you should have a better understanding of what it takes to port a PC VR project to Quest 2, how to find CPU and GPU bottlenecks, and how to optimize your application accordingly. Please check out the project on
GitHub and post to the
Oculus developer forums with any questions.
Additional resources