“You are defeated. Instead of shooting where I was, you should
have shot where I was going to be. Muahahahaha!”
-Lrrr (character from Futurama, after invading Earth in
Space Invaders
style)Introduction
We have all learned that latency is the bane of virtual reality.
Because your head freely rotates, presenting the correct image to
your eyes is like firing a bullet at a moving target. The target is
“sighted” by sensor fusion software, which provides the direction
you are currently looking (see
[8] for a survey of techniques). Latency is the time it takes
between your head moving to a new orientation and the correct image
arriving on your retinas. In real, old-fashioned reality, the
latency is effectively zero. In VR, latency is widely recognized as
a key source of disorientation and disbelief (the brain cannot be
fooled). In this post, I will argue that simple prediction
techniques can reduce latency so much that it is no longer the main
problem. Simply present the image that corresponds to where the
head is
going to be, rather than
where it was.
As enthusiasm for VR gaming has ramped up over the past year,
recent blog articles and talks by veterans John Carmack and Michael
Abrash have focused on latency as the key obstacle. In
[3],
Carmack provides a nice summary of the factors that contribute to
latency. He offers some strategies to reduce it, but does not
emphasize prediction as serious contender. In
[1], Abrash calls for fundamental change in the way rendering
and display technology are currently pursued so that latency can be
reduced while maintaining high fidelity images. I hope this happens
because it could improve VR experiences on all fronts! Prediction,
however, is discouraged due to increased error when the motion
direction changes.
How much latency is too much? Based on VR research during the
1990s, 60 milliseconds (ms) has been commonly cited as an upper
limit for acceptable VR. Even so, I definitely notice a disturbing
lag when the latency is greater than 40ms. Most people agree that
if latency is below 20ms, then the lag is no longer perceptible.
Abrash even calls for 7ms to 15ms to be safe. How close are we in
modern times? For a game running at 60 FPS, the latency when using
the Oculus Rift Development Kit is typically in the range from 30ms
to 50ms, including time for sensing, data arrival over USB, sensor
fusion, game simulation, rendering, and video output. Other
factors, such as LCD pixel switching times, game complexity, and
unintentional frame buffering may drive it higher, but it is
important to note that the latency period is generally shorter than
it was decades ago.
Why is prediction often not taken too seriously? The most likely
reason is that through decades of VR research, it has become widely
known as a double-edged sword. Most of the time it works, but then
makes catastrophic errors when the head motion abruptly changes.
This was true across a wide range of VR and AR systems; however,
the game has changed thanks to new technologies. The main factors
are:
- How far into the future do we need to predict?
- How far into the past do we need to look to estimate the
trajectory?
Regarding the first factor, Ronald Azuma’s highly cited 1995
thesis
[2] on
predictive tracking for AR insists on keeping the prediction
interval “short”: Below 80ms! If the latency in a current system is
only 50ms, then accurately predicting around 30 to 40ms into the
future would already tackle most of the problem, even satisfying
Abrash’s extreme demands.
To understand the effect of shortening the prediction interval,
suppose that the head accelerates at a rate of
deg/sec². After
seconds, the angular velocity
will change by
deg/sec and the
orientation will change by
degrees. It is notoriously difficult to estimate time
derivatives, making it hard to accurately measure angular
acceleration [5]. Furthermore, the acceleration could change during
the prediction interval. In either case, the error could grow at
least quadratically with respect to the prediction interval length.
For an aggressive acceleration of 1000 deg/sec² that goes
unaccounted for (see the figure), the error in head orientation for
20ms is 0.2 degrees. For 40ms, it is already 0.8 degrees. At 80ms,
it is already up to 3.2 degrees. Therefore, predicting 20ms into
the future is
much easier than predicting 40ms, and 40ms is
much easier than 60 to 80ms. Also working to our advantage in
shorter intervals is the fact that the head, while wearing a VR
headset, has significant momentum. This has a smoothing effect that
prohibits excessive rate changes.
Now consider the second factor: If we need to look too far into
the past to reliably estimate the trend, then an implicit latency
is built into the estimate. Suppose that optical tracking is
performed using cameras at a nice rate of 60 frames per second.
This means an estimate of the head orientation arrives every
16.67ms. If these estimates are noisy, and we would furthermore
like to know how the orientation is changing, then several
measurements are needed. If we use 6 samples, then we have reached
100ms into the past to determine the trend, effectively lengthening
the prediction interval.
We can thank the smartphone industry for helping with this
problem. MEMS-based sensors continue to improve, providing
accurate, high-frequency measurements in a tiny, low-cost package.
Modern gyroscopes provide angular velocity measurements at 1000Hz.
Even at this incredible rate, the measurements are even prefiltered
to reduce noise; raw measurements can be obtained at around
10,000Hz. So, only 1ms of MEMS gyroscope data may be more
informative than looking 100ms into the past with an optical
tracking system. Better yet, a few milliseconds of gyroscope data
could enable accelerations or higher-order trends to be
estimated.
To summarize, the game has changed: Trackers do not need to
predict as far into the future, and they barely need to look into
the past.
Some Technical Details
Predictive tracking or filtering is an old idea, extending back
to the early days of signal processing and control theory. Let the
state refer to the quantity that we would like to track and
predict. A classical example is the position, orientation, and
velocity of an aircraft. Predictive filtering is based on three
parts:
- The sensor readings up until now.
- A model of what each reading tells about the state at that
time.
- A model of how the state changes over time.
To keep it simple, let’s suppose that #1 and #2 cause no
trouble: A sensor reading directly provides the current state. If
we obtain sensor readings at regular intervals (for example, every
10 ms), then what will the state be one step into the future? Let
refer to the
ith
reading. A
linear
prediction approach looks like
,
in which
are constants chosen in advance. They provide the predictive model
(#3 above). For example,
is a
simple model which predicts that the state will never change. The
model
predicts that the state changes at a fixed rate.
More complicated linear prediction filters can handle other
factors, such as noise reduction and state acceleration.
Linear prediction is just one type of filter among many others.
For example,
Bayesian
filters use probabilistic modeling (in #2 and #3 above) to
arrive at distributions over possible current states and future
states. Heavier weights are given to more likely futures. The
celebrated Kalman filter is a famous special case of Bayesian
filters for which all of the distributions become Gaussian and
there is a nice update formula for each step. The most basic and
general way to view filter design is in terms of
information
states, an idea introduced by von Neumann and Morgenstern [7]:
When playing an iterative game with uncertain current state, all
past information is compressed into some representation that is
critical for decision making. In our case, the “decision” is
specifying the future state. The information state is updated in
each stage, and forms the basis for a decision. Think about what
you need to keep track of to play
Battleship
effectively.
Card counting
strategies for Blackjack are another good example. Finally, what
information state should a game AI maintain? For filtering from an
information-state perspective, see Chapter 11 of
my book.
Now consider tracking head orientation, which means the state is
a quaternion that represents head orientation. From my
previous blog post, the orientation is updated every
millisecond by calculating a quaternion that represents the
rotation that occurred over that time interval. The critical piece
of information is the current angular velocity, as measured using a
gyroscope.
Consider the following methods:
- No prediction: Just present the updated quaternion to
the renderer.
- Constant rate: Assume the currently measured angular
velocity will remain constant over the latency interval.
- Constant acceleration: Estimate angular acceleration and
adjust angular velocity accordingly over the latency interval.
The first method seems absurd because it assumes that the head
will immediately come to a complete stop and remain that way over
the latency interval (recall
). The
second method extends the rotation rate over the latency interval
(recall
, but now we use the angular velocity). If the rotation
rate remains constant, then the rotation axis is unchanged. We only
need to extend the rotation angle about that axis to account for
the longer time interval. To predict 20ms into the future, simply
replace
with
.
The third method allows the angular velocity to change at a linear
rate when looking into the future. The angular acceleration is
estimated from the change in gyroscope data. For each small step
1ms into the future, the acceleration is applied to change the
predicted angular velocity. For example, if the head is
decelerating, then its predicted angular velocity will be smaller
in each time step along the latency interval. The figure shows
their differences in terms of calculated angular velocity over the
prediction interval.
One remaining detail is noise reduction. Errors in estimating the
current angular velocity tend to be amplified when making
predictions over a long time interval. Vibrations derived from
noise are particularly noticeable when the head is not rotating
quickly. Therefore, we use simple smoothing filters in the
estimation of current angular velocity (Methods 2 and 3) and
current angular acceleration (Method 3). We use
Savitzky-Golay filters, but many other methods should work just
as well.
Performance
A simple way to evaluate performance is to record predicted
values and compare them to the current estimated value after the
prediction interval has passed. Note that this does not compare to
actual ground truth, but it is very close because the drift error
rate from gyroscope integration is very small over the prediction
interval. I’ve compared the performance of several methods with
prediction intervals ranging from 20ms to 100ms. The following
graph shows error in terms of degrees, for a prediction interval of
20ms, using the Oculus Rift sensor over a 3 second interval:
I was wearing the Rift and turning my head back and forth, with
a peak rate of about 240 deg/sec, which is fairly fast. This is
close to reported peak velocities in published VR studies [4,6].
The blue line represents Method 1 (no prediction), which performs
the worst. The red line shows Method 2 (constant rate), which is
much improved. The yellow line shows Method 3 (constant
acceleration), which performs the best in the comparison. Method 1
is used by default in the original SDK release for the Rift
Development Kit, but with prediction turned on, Method 2 is used. A
variant of Method 3 is expected to appear in an upcoming
release.
Numerically, the angular errors for predicting 20ms into the
future are:
Method | Average error | Worst error |
1 | 1.46302 | 4.77900 |
2 | 0.19395 | 0.71637 |
3 | 0.07596 | 0.35879 |
During these motions, the acceleration peaked at around 850
deg/sec². The fastest I could rotate my head while wearing the Rift
was about 600 deg/sec, with peak accelerations of around 20,000
degrees/sec² (and my neck still hurts as I am typing this). By
flipping the Rift in my hands and catching it again, I was able to
obtain 1400 deg/sec and 115,000 deg/sec²; however, these speeds are
unreasonable! Typical, slower motions, which are common in game
play, yield around 60 deg/sec in velocity and 500 deg/sec² in peak
accelerations. For both slow and fast motions with a 20ms
prediction interval, Method 3 is generally superior over the
others. If we double the prediction interval, then performance
degrades; however, the prediction methods remain preferable over
nothing. For similar head motions as above, the results for 40ms
prediction are:
Method | Average error | Worst error |
1 | 3.36267 | 9.68985 |
2 | 0.57410 | 1.59862 |
3 | 0.17338 | 0.50788 |
Discussion
During ordinary game play, even with some fast head motions,
simple prediction techniques accurately predict 20 to 40ms into the
future. Subtracting this time from the actual latency results in an
effective latency that is well below 20ms. Hooray! It
appears that the holy grail has been reached. But not really. As
mentioned before, other factors may drive the actual latency
higher. Also, the effect of small prediction errors is difficult to
assess. This ties directly into perception, which is an
important topic missing from the discussion so far. For example,
when the head is almost stationary, small perturbations are more
noticeable than when the head quickly rotates. How much error is
imperceptible and how does this vary with respect to angular
velocity, acceleration, screen resolution, shading methods, and so
on? How important is the direction of the error as it
propagates over time? Answers to these questions would help to
further improve prediction methods. At the same time, improvements
in computation power, software, rendering, and display technologies
(OLEDs) are expected to reduce the actual latency, which would
further shorten the required prediction interval.
Latency is no longer the powerful beast that it once was. It has
been beaten down and nearly defeated by modern sensing technology
and effective filtering techniques. This will cause attention to
shift to a host of other problems. If latency is no longer causing
simulator sickness, then what about the game content? A fast ride
on a virtual roller coaster may cause more disorientation than
latency or other VR system artifacts. Furthermore, what kind of
user interfaces are most appropriate? What game genres will emerge
to provide the best VR experience? As display resolution and
switching speeds improve, how should judder be addressed? The list
goes on and on. Exciting times lie ahead!
Acknowledgments
I am grateful to Tom Forsyth, Peter Giokaris, Nate Mitchell, and
Laurent Scallie for helpful discussions.
References
[4] List, Uwe H. Nonlinear Prediction of Head Movements for
Helmet-Mounted Displays. Technical report AFHRLTP-83-45, William
AFB, AZ: Operations Training Division Air Force Human Resources
Laboratory, 1983.
[5] Ovaska, S. J., and Valiviita, S., Angular Acceleration
Measurement: A Review, IEEE Transactions on Instrumentation and
Measurement, Volume 47, Number 5, Pages 1211-1217, 1998.
[6] Smith Jr., Bernard R., Digital Head Tracking and Position:
Prediction for Helmet Mounted Visual Display Systems, Proceedings
of AIAA 22nd Aerospace Sciences Meeting, 1984.
[7] von Neumann, John, and Morgenstern, Oskar, Theory of Games
and Economic Behavior, Princeton University Press, 1944.
[8] Welch, Greg, and Foxlin, Eric,
Motion Tracking: No Silver Bullet, but a Respectable Arsenal,
Computer Graphics and Applications, Volume 22, Number 6, Pages
24-38, 2002.