This blog post will explore when and what to optimize on mobile devices. A high level overview of game optimization is provided, as well as a workflow for profiling.
Whenever we talk about optimization, the first thing everyone tends to think of is Frames Per Second (FPS). While FPS is a good metric if you want to know
when to optimize, it doesn't provide any insight into
what to optimize. Oculus provides the
OVR Metrics Tool for mobile devices.
The OVR Metrics Tool is a standalone Android application you can install on any Android device. This tool will provide a real time frame graph without any application side integration.
To use the OVR Metrics Tool,
download it and install the apk. Use the
-g
flag when installing to automatically grant all permissions.
adb install -g OVRMetricsTool_v1.1.apk
Once installed, enable the metrics tool with the following commands:
adb shell setprop debug.oculus.notifPackage com.oculus.ovrmonitormetricsservice
adb shell setprop debug.oculus.notifClass com.oculus.ovrmonitormetricsservice.PanelService
The next time a VR application launches, a frame graph similar to the image above should show up. Any time the phone is restarted, the above commands need to be re-run to start the metrics tool.
A better metric for optimizing games is milliseconds per frame! Measuring each frame in milliseconds provides an actionable goal. Every frame has to finish all its CPU and GPU work in a set amount of time (usually measured in milliseconds) to hit a certain frame-rate.
When measuring milliseconds per frame, each frame has a budget, a time it must fit into. Common performance targets for mobile devices are:
- 30 FPS = 33.3 milliseconds per frame
- 60 FPS = 16.6 milliseconds per frame
- 72 FPS = 13.7 milliseconds per frame
Let's take a look at a simple example of using milliseconds as a frame budget. The below graph is something you might see from any profiler. The game in question is running at 30 FPS. The goal is to get it running at 60!
When looking at the above image, we can see that one frame takes 21.3 milliseconds to update and render. The application waits for the next V Synch, which is 12 milliseconds away. This makes the total frame time 33.3 milliseconds, or 30 frames per second.
In order to run at 60 frames per second, the time it takes to update and render a frame must be less than 16.6 milliseconds. In the above example, the game logic takes significantly longer than rendering the game. 4.7 milliseconds need to be cut, and the game logic seems like a good place to start.
Think of the green bar in the above graph that represents the Wait for V-Synch time as padding. It can't be optimized away, and will always fill out the time it takes to the next frame. After optimizing, the above frame should look something like this:
After optimization, every frame fits into a snug 16.6 millisecond window. There are now two frames spanning the 33.3 milliseconds that only one frame took up before. The optimization in this example was all CPU side, reducing the game logic from 15.3 milliseconds to 9 milliseconds. The wait for V-Synch time is still there, as expected but it only has to wait for 1.6 milliseconds. The game in the above graph is running at 60 frames per second.
Things get a little more complicated in the real world. The example above shows a simple game loop running on one thread, where rendering happens only after all game logic is done executing. All modern engines have multi-thread rendering. That is, there is a render thread and a main thread. The main thread executes all of the games logic, and submits draw calls to the render thread. The main thread is able to start processing the next frame while the render thread is working on rendering the current frame. This will be covered in depth in a future blog post.
Profiling Workflow (What to optimize)
In the above example, it was obvious what needed to be optimized. This was an unrealistic example, in an actual game it's usually much harder to tell what needs to be optimized. Having a plan on how to profile a game will make tracking down what needs to be optimized much easier.
CPU or GPU Bound?The first question when profiling, is the game in question GPU bound or CPU bound? Determining if a game is CPU or GPU bound is pretty easy, don't render anything, to do this is to turn off the render camera and let the game continue to run. Doing this will eliminate the cost of the render pipeline, IE culling, submitting draw calls, running shaders, etc... Keep an eye on both the frame rate of the game, and the milliseconds each frame take.
- If the games performance is not affected, or affected very little, the game is likely CPU bound.
- If performance improves significantly, the game is likely GPU bound.
CPU BoundCommon causes for a game to be CPU bound are the complexity of game logic, physics simulation, gc stalls, etc.
Use an instrumented profiler, like the ones built into
Unity and
Unreal to track down performance bottle necks.
Focus on optimizing only the most expensive code paths. Any game logic that takes longer than two milliseconds can probably be optimized.
GPU Bound
When a game is GPU bound, it can generally be categorized into one of two states: vertex bound or fragment bound.
A game that is vertex bound has issues with scene complexity. On the other hand, a fragment bound game has issues with shader complexity.
The way to test for this is to render less pixels. You can do this by setting the games render scale to something really small, like 0.01. This will cause less fragments to be rendered, but keep the scene complexity.
- If performance is not affected, the game is likely Vertex bound.
- If performance improves, the game is likely Fragment bound.
UnityEngine.XR.XRSettings.eyeTextureResolutionScale = 0.01f;
/* Legacy: */ UnityEngine.VR.VRSettings.renderScale = 0.01f;
UHeadMountedDisplayFunctionLibrary::SetScreenPercentage(0.01f);
Vertex Bound
The way we determined if an application is CPU or GPU bound (by disabling all render cameras) means that some CPU side calculations like frustum culling are considered being vertex bound. While this isn't completely accurate, these operations are often optimized by the same actions we take to optimize a vertex bound game. Common issues for a vertex bound game are:
- Culling objects is taking too long
- Too many draw calls are being issued
- Too many vertices are being rendered
Fragment Bound
Pixel complexity and overdraw tend to be the main issues for fragment bound shaders. Be sure to read the Oculus
rendering guidelines.
Rinse and Repeat
Profiling and optimizing is an iterative process. Chances are, optimizing the first contention away will not get the game to perf. It usually takes a few rounds of optimization to get to the desired frame rate. Go through the full profiling chart after each optimization pass. If the game was CPU bound before, and it's been optimized, it could end up being fill bound next.
Below is a high level flow chart of what to test for. Keep following the chart until you hit frame rate!