There are a few things which can fundamentally affect VR comfort, latency is obviously one of them. The lower the latency, the closer to a real life experience in VR you can achieve, so it makes sense for us to keep improving it, until the number becomes 0 someday (fingers crossed). We recently implemented a late latching solution for Quest with UE4.23 + Vulkan, which can help reduce your app’s rendering latency. While you can enable it by simply checking a checkbox in the Oculus Github UE4 build, in this article we provide an overview of the solution, its design/implementation, why it’s important for Quest development, and a few tips for getting started with this new developer feature.
Why do we need late latching on Quest?
When thinking about HMD latency reduction, late latching seems like a very natural and straight-forward idea. Check out this article from 2015, as it does a great job of providing an
overview of late latching as a concept.
You might ask, “why aren’t there any published apps with late latching enabled? And if we didn’t need it before on Rift, why do we need it now for Quest? These questions can be answered from two perspectives, one from hardware, one from software.
First, Late latching works more efficiently on Quest than on Rift due to GPU architecture. Unlike a Desktop GPU, which can start working on GPU commands right after the render thread generates them, a Quest GPU (Adreno 540) uses tiled architecture, which needs to gather the whole frame’s draw call information before kicking off the GPU. Inherently, the CPU-To-GPU latency is higher, which means there is added benefit to late latching.
Second, Vulkan has finally matured for developing on Mobile hardware. Vulkan offers the ability to do explicit memory management, and improved CPU-GPU synchronization management, which both make implementing a robust late latching solution possible.
Design and implementation of late latching
The whole idea of implementing late latching is to re-update tracking-dependent rendering data (usually the uniform buffer storing view matrices) safely before the GPU begins using them.
First, we need be able to re-update tracking-dependent data after draw calls have been submitted. On GLES, this can be difficult since the driver manages uniform buffer memory, you might have to rely on vendor-specific extensions to do this, which can be less than ideal to the game engine integration. However, on Vulkan, it is much more straightforward, as we always manage our memory ourselves, for this case, we just need to make the view uniform buffer HOST_VISIBLE and HOST_COHERENT, then we can map it once, save the pointer, and modify it on the CPU easily at any time.
Second, we need to do it safely. We have to guarantee that we are not modifying the data when the GPU is reading them simultaneously, otherwise, we might experience a number of unexpected behaviors. Depending on the game engine’s rendering submission behaviors and how much control we have, we can handle it with the following two options.
Protected approach
If we treat the game engine as a blackbox, one way to update tracking-dependent rendering data is by CPU-GPU synchronization. We simply add vkCmdWaitEvents before rendering anything to hold the GPU in case there are any mid-frame flushing. We then re-update the view uniform buffer at the end of the frame, and call vkSetEvent directly afterward to let the GPU go. This covers all situations, even if the GPU was implicitly or explicitly kicked off in the middle of the frame.
Simplified approach
For the UE4 Mobile Renderer, the situation is actually much simpler. Normally, there is actually no mid-frame flushing and only a single vkQueue is used. It has been guaranteed the GPU won’t work on those draw calls generated by the render thread until vkQueueSubmit, so it’s fine if we re-update the view uniform buffer with new fetched HMD poses right before vkQueueSubmit.
As an open source engine, some kind of mid-frame submissions might be added into UE4 later, to detect these, we added a late latching aborting system, which will detect mid-frame submission. If triggered, late latching will be aborted in the frame and print out the warning "Late Latching aborting…”, please keep an eye on it, if your app prints a lot of logs with this warning, it might mean your local modification is conflicting with the simplified late latching implementation. In this case, you might need to add the vkCmdWaitEvents / vkSetEvent mechanism as we discussed earlier.
UE4 implementation detail
In the previous section, we provided an overview of how late latching was designed, in actual UE4 implementation, there are some interesting details in case you want to read the code:
“Late update”
UE4 already has a system called “Late update” for VR, which gives the engine a final opportunity to update tracking dependent rendering data at the beginning of the render thread. The system is actually well suited to be reused for late latching. You can think of late latching as the “Late Late update”.
Mobile Vulkan uniform buffer
UE4’s mobile vulkan uniform buffer handling is quite unique. Unlike most engines, there are multiple uniform buffers and each has their own purpose. UE4 only has one single gigantic uniform buffer (8M) for all draw calls, which is called the “packed uniform buffer system”. Each draw call will allocate a small segment from the gigantic uniform buffer, packing all necessary uniform buffer data into it. The solution can dramatically reduce the PSO amount. However, it creates extra complexity for late latching, as we can’t just update one single-shared view uniform buffer, we have to track where all those view matrices go and patch them one by one at the end of the rendering thread. This increases the overhead of late latching, but fortunately, it wasn’t significant for most apps we tested, as there was minimal CPU overhead (0.1ms - 0.3ms).
Controller late latching
There are no major differences between HMD latching and controller latching from a graphic API point of view, however, controllers can have a relatively complex transform hierarchy. For example, if you have a sword with glowing gems along the blade and we need to apply late latching throughout, we have to go through the hierarchy after the controller received the new poses. Fortunately, a lot of work can be shared with the late-update system as well.
Late latching benefits + How to get started
How much latency late latching can save is dependent on the app’s render thread workload. Interesting enough, the more computationally expensive the render thread, the more late latching can save, because the difference between the start of the render thread and the late latching point gets larger.
Generally, lower latency should improve VR comfort, while visually it can reduce several common graphic artifacts.
Less TimeWarp black pulling
Rotational latency is sufficiently managed by TimeWarp, but lower prediction latency can still help to reduce the black pulling artifacts.
More spatial-stable near field object
Positional latency affecting near field object most, if you are looking at a nearby object and make translational movement with your head, you might see the object appearing shaky, reducing latency can help to improve that situation.
Better controller experience
Shorter prediction latency helps controller movement as well, this should result in less controller overshooting and undershooting when the user waves the controllers.
Our late latching solution was shipped in the
1.41 release as an experimental feature. To enable it, download and build
UE4.23 from our github, check the “Late Latching” checkbox under Project Settings:
Be sure to check “Support Vulkan”:
Quick Tip
App’s rendering latency can be read from logcat, type “adb logcat -s VrApi”, and you can get logs like “I/VrApi (26422): FPS=72,Prd=45ms,Tear=0,Early=73,Stale=0,VSnc=1,Lat=1,Fov=0 ….”, ”Prd” here means the app’s latest rendering latency. We had added a console variable “r.ForceDisableLateLatching” in UE4, if you toggle it dynamically and observe the Prd value, you can view how many milliseconds late latching has saved.
We hope this latest feature helps you minimize latency in your next UE4 build. For more learnings and best practices specific to performance and graphic optimization, be sure to check out the articles below.