So you’ve decided to build a VR game in Unity and have settled on the Samsung Gear VR as your target platform. Getting it up and running on the device was easy enough, but there’s a problem—the frame rate is just too low. Your reticle snaps and sticks, there are flickering black bars on the sides of your vision, and motion looks like somebody just kicked the camera operator in the shins. You’ve read about how important it is to maintain a solid frame rate, and now you know why—in mobile VR, anything less than 60 frames per second doesn’t just look bad, it feels bad. Your high-end PC runs the game at about 1000 frames per second, but it sounds like a jet engine and actually levitates slightly when the fans really get going. What you need is a way to optimize your masterpiece to run on a mobile chipset. This series of articles is designed to help you do just that.
Part 1: The Gear VR Environment and Traits of Efficient VR Games
This isn’t an all-encompassing expose on performance optimization for the Gear VR–it’s more of a quick start. In this first post we’ll discuss the Gear VR hardware and traits of a well-designed mobile VR application. A follow-up post will cover performance improvement for apps you’ve already built. I’ve elected to base this article on the behavior (and quirks) of Unity, as it seems to be very popular amongst Gear VR developers. Still, the concepts presented here should apply to just about any game engine.
Know Your Hardware
Before you tear your project apart looking for inefficiencies, it’s worth thinking a little about the performance characteristics of mobile phones. Generally speaking, mobile graphics pipelines rely on a pretty fast CPU that is connected to a pretty fast GPU by a pretty slow bus and/or memory controller, and an OpenGL ES driver with a lot of overhead. The Gear VR runs on the Samsung Note 4 and the Samsung Galaxy S6. These two product lines actually represent a number of different hardware configurations:
- The Note 4 comes in two chipset flavors. Devices sold in North
America and Europe are based on Qualcomm’s Snapdragon chipset
(specifically, a Snapdragon 805), while those sold in South Korea
and some other parts of Asia include Samsung’s Exynos chipset (the
Exynos 5433). The Snapdragon is a quad-core CPU configuration,
while the Exynos has eight cores. These devices sport two different
GPUs: the Adreno 420 and Mali-T760, respectively.
- The Note 4 devices are further segmented by operating system.
Most run Android 4.4.4 (KitKat) but Android 5 (Lollipop) is now
available as an update on most carriers around the world. The
Exynos-based Note 4 devices all run Android 5.
- The Galaxy S6 devices are all based on the same chipset: the
Exynos 7420 (with a Mali-T760M8 GPU). There is actually a second
version of the S6, the Galaxy S6 Edge, but internally it is the
same as the S6.
- All Galaxy S6 devices ship with Android 5.
If this seems like a lot to manage, don’t worry: though the
hardware varies from device to device, the performance profile of
all these devices is pretty similar (with one serious exception—see
“Gotchas” below). If you can make it fast on one device, it should
run well on all the others.
As with most mobile chipsets, these devices have pretty reliable
characteristics when it comes to 3D graphics performance. Here are
the things that generally slow Gear VR projects down (in order of
severity):
- Scenes requiring dependent renders (e.g., shadows and
reflections) (CPU / GPU cost).
- Binding VBOs to issue draw calls (CPU / driver cost).
- Transparency, multi-pass shaders, per-pixel lighting, and other
effects that fill a lot of pixels (GPU / IO cost).
- Large texture loads, blits, and other forms of memcpy (IO /
memory controller cost).
- Skinned animation (CPU cost).
- Unity garbage collection overhead (CPU cost).
On the other hand, these devices have relatively large amounts of RAM and can push quite a lot of polygons. Note that the Note 4 and S6 are both 2560×1440 displays, though by default we render to two 1024×1024 textures to save fill rate.
Know the VR Environment
VR rendering throws hardware performance characteristics into sharp relief because every frame must be drawn twice, once for each eye. In Unity 4.6.4p3 and 5.0.1p1 (the latest releases at the time of this writing), that means that every draw call is issued twice, every mesh is drawn twice, and every texture is bound twice. There is also a small amount of overhead involved in putting the final output frame together with distortion and TimeWarp (budget for 2 ms). It is reasonable to expect optimizations that will improve performance in the future, but as of right now we’re stuck with drawing the whole frame twice. That means that some of the most expensive parts of the graphics pipeline cost twice as much time in VR as they would in a flat game.
With that in mind, here are some reasonable targets for Gear VR applications.
This frame is about 30,000 polygons and
40 draw calls.
- 50 – 100 draw calls per frame
- 50k – 100k polygons per frame
- As few textures as possible (but they can be large)
- 1 ~ 3 ms spent in script execution (Unity Update())
Bear in mind that these are not hard limits; treat them as rules of thumb.
Also note that the Oculus Mobile SDK introduces an API for throttling the CPU and GPU to control heat and battery drain (see OVRModeParams.cs for sample usage). These methods allow you to choose whether the CPU or GPU is more important for your particular scene. For example, if you are bound on draw call submission, clocking the CPU up (and the GPU down) might improve overall frame rate. If you neglect to set these values, your application will be throttled down significantly, so take time to experiment with them.
Finally, Gear VR comes with Oculus’s Asynchronous TimeWarp technology. TimeWarp provides intermediate frames based on very recent head pose information when your game starts to slow down. It works by distorting the previous frame to match the more recent head pose, and while it will help you smooth out a few dropped frames now and then, it’s not an excuse to run at less than 60 frames per second all the time. If you see black flickering bars at the edges of your vision when you shake your head, that indicates that your game is running slowly enough that TimeWarp doesn’t have a recent enough frame to fill in the blanks.
Designing for Performance
The best way to produce a high-performance application is to design for it up-front. For Gear VR applications, that usually means designing your art assets around the characteristics of mobile GPUs.
Setup
Before you start, make sure that your Unity project settings are organized for maximum performance. Specifically, ensure that the following values are set:
- Static batching
- Dynamic batching
- GPU skinning
- Multithreaded Rendering
- Default Orientation to Landscape Left
Batching
Since we know that draw calls are usually the most expensive part of a Gear VR application, a fantastic first step is to design your art to require as few draw calls as possible. A draw call is a command to the GPU to draw a mesh or a part of a mesh. The expensive part of this operation is actually the selection of the mesh itself. Every time the game decides to draw a new mesh, that mesh must be processed by the driver before it can be submitted to the GPU. The shader must be bound, format conversions might take place, et cetera; the driver has CPU work to do every time a new mesh is selected. It is this selection process that incurs the most overhead when issuing a draw call.
However, that also means that once a mesh (or, more specifically, a vertex buffer object, or VBO) is selected, we can pay the selection cost once and draw it multiple times. As long as no new mesh (or shader, or texture) is selected, the state will be cached in the driver and subsequent draw calls will issue much more quickly. To leverage this behavior to improve performance, we can actually wrap multiple meshes up into a single large array of verts and draw them individually out of the same vertex buffer object. We pay the selection cost for the whole mesh once, then issue as many draw calls as we can from meshes contained within that object. This trick, called batching, is much faster than creating a unique VBO for each mesh, and is the basis for almost all of our draw call optimization.
All of the meshes contained within a single VBO must have the same material settings for batching to work properly: the same texture, the same shader, and the same shader parameters. To leverage batching in Unity, you actually need to go a step further: objects will only be batched properly if they have the same material object pointer. To that end, here are some rules of thumb:
- Macrotexture / Texture Atlases: Use as few textures as possible
by mapping as many of your models as possible to a small number of
large textures.
- Static Flag: Mark all objects that will never move as
Static in the Unity Inspector.
- Material Access: Be careful when accessing Renderer.material.
This will duplicate the material and give you back the copy, which
will opt that object out of batching consideration (as its material
pointer is now unique). Use Renderer.sharedMaterial.
- Ensure batching is turned on: Make sure Static Batching
and Dynamic Batching are both enabled in Player Settings
(see below).
Unity provides two different methods to batch meshes together:static batching and dynamic batching.
Static Batching
When you mark a mesh as static, you are telling Unity that this object will never move, animate, or scale. Unity uses that information to automatically batch together meshes that share materials into a single, large mesh at build time. In some cases, this can be a significant optimization; in addition to grouping meshes together to reduce draw calls, Unity also burns transformations into the vertex positions of each mesh, so that they do not need to be transformed at runtime. The more parts of your scene that you can mark as static, the better. Just remember that this process requires meshes to have the same material in order to be batched.
Note that since static batching generates new conglomerate meshes at build time, it may increase the final size of your application binary. This usually isn’t a problem for Gear VR developers, but if your game has a lot of individual scenes, and each scene has a lot of static mesh, the cost can add up. Another option is to use
StaticBatchingUtility.Combine at runtime to generate the batched mesh without bloating the size of your application (at the cost of a one-time significant CPU hit and some memory).
Finally, be careful to ensure that the version of Unity you are using supports static batching (see “Gotchas” below).
Dynamic Batching
Unity can also batch meshes that are not marked as static as long as they conform to the shared material requirement. If you have the Dynamic Batching option turned on, this process is mostly automatic. There is some overhead to compute the meshes to be batched every frame, but it almost always yields a significant net win in terms of performance.
Other batching Issues
Note that there are a few other ways you can break batching. Drawing shadows and other multi-pass shaders requires a state switch and prevents objects from batching correctly. Multi-pass shaders can also cause the mesh to be submitted multiple times, and should be treated with caution on the Gear VR. Per-pixel lighting can have the same effect: using the default Diffuse shader in Unity 4, the mesh will be resubmitted for each light that touches it.This can quickly blow out your draw call and poly count limits. If you need per-pixel lighting, try setting the total number of simultaneous lights in the Quality Settings window to one. The closest light will be rendered per-pixel, and surrounding lights will be calculated using spherical harmonics. Even better, drop all pixel lights and rely on Light Probes. Also note that batching usually doesn’t work on skinned meshes. Transparent objects must be drawn in a certain order and therefore rarely batch well.
The good news is that you can actually test and tune batching in the editor. Both the Unity Profiler (Unity Pro only) and the Stats pane on the Game window can show you how many draw calls are being issued and how many are being saved by batching. If you organize your geometry around a very small number of textures, make sure that you do not instance your materials, and mark static objects with the Static Flag, you should be well on your way to a very efficient scene.
Transparency, Alpha Test, and Overdraw
As discussed above, mobile chipsets are often “fill-bound,” meaning that the cost of filling pixels can be the most expensive part of the frame. The key to reducing fill cost is to try to draw every pixel on the screen only once. Multi-pass shaders, per-pixel lighting effects (such as Unity’s default specular shader), and transparent objects all require multiple passes over the pixels that they touch. If you touch too many of these pixels, you can saturate the bus.
As a best practice, try to limit the Pixel Light Count in Quality Settings to one. If you use more than one per-pixel light, make sure you know which geometry it is being applied to and the cost of drawing that geometry multiple times. Similarly, strive to keep transparent objects small. The cost here is touching pixels, so the fewer pixels you touch, the faster your frame can complete. Watch out for transparent particle effects like smoke that may touch many more pixels than you expect with mostly-transparent quads.
Also note that you should never use alpha test shaders, such as Unity’s cutout shader, on a mobile device. The alpha test operation (as well as clip(), or an explicit discard in the fragment shader) forces some common mobile GPUs to opt out of certain hardware fill optimizations, making it extremely slow. Discarding fragments mid-pipe also tends to cause a lot of ugly aliasing, so stick opaque geometry or alpha-to-coverage for cutouts.
Performance Throttling
Before you can test the performance of your scene reliably, you need to ensure that your CPU and GPU throttling settings are set. Because VR games push mobile phones to their limit, you are required to select a weighting between the CPU and GPU. If your game is CPU-bound, you can downclock the GPU in order to run the CPU at full speed. If your app is GPU-bound you can do the reverse. And if you have a highly efficient app, you can downclock both and save your users a bunch of battery life to encourage longer play sessions. See “Power Management” in the
Mobile SDK documentation for more information about CPU and GPU throttling.
The important point here is that you must select a CPU and GPU throttle setting before you do any sort of performance testing. If you fail to initialize these values, your app will run in a significantly downclocked environment by default. Since most Gear VR applications tend to be bound on CPU-side driver overhead (like draw call submission), it is common to set the clock settings to favor the CPU over the GPU. An example of how to initialize throttling targets can be found in OVRModeParams.cs, which you can copy and paste into a script that executes on game startup.
Gotchas
Here are some tricky things you should keep in the back of your mind while considering your performance profile.
- One particular device profile, specifically the Snapdragon-based Note 4 running Android 5, is slower than everything else; the graphics driver seems to contain a regression related to draw call submission. Games that are already draw call bound may find that this new overhead (which can be as much as a 20% increase in draw call time) is significant enough to cause regular pipeline stalls and drop the overall frame rate. We’re working hard with Samsung and Qualcomm to resolve this regression. Snapdragon-based Note 4 devices running Android 4.4, as well as Exynos-based Note 4 and S6 devices, are unaffected.
- Though throttling the CPU and GPU dramatically reduces the amount of heat generated by the phone, it is still possible for heavy-weight applications to cause the device to overheat during long play sessions. When this happens, the phone warns the user, then dynamically lowers the clock rate of its processors, which usually makes VR applications unusable. If you are working on performance testing and manage to overheat your device, let it sit without the game running for a good five minutes before testing again.
- Unity 4 Free does not support static batching or the Unity Profiler. However, Unity 5 Personal Edition does.
- The S6 does not support anisotropic texture filtering.
That’s all for now. In the next post, we’ll discuss how to go about debugging real-world performance problems.
For more information on optimizing your Unity mobile VR development, see “Best Practices: Mobile” in our
Unity Integration guide.