This post is a high level overview of our near-field rendering tech.
This is the first article in a series reviewing new functionality in the Audio SDK. The following post covers our near field rendering tech.
Binaural 3D audio works by applying to a sound a unique filter for each ear based on the 3D position of the sound source. The term “filter” can be used to describe very different things from simple EQ all the way to complex reverberation. So what are we talking about here?
Just as a reverberation filter captures in its binaural impulse response (IR) all the ways a sound can interact with the surrounding environment on its way to the listener’s ears, a binaural spatialization filter captures all the ways a sound can interact with the listener's body on its way to the ears.
In the reverberation case, the IRs are much longer and chaotic due to the size and complexity of the environment. We’ve been taking advantage of this for years to make an approximation of an environment with a single binaural reverb IR because beyond the 1st few bounces, spatialization is buried in a fading chaos that we perceive unconsciously as a diffuse connection to our surrounding environment.
In the binaural 3D spatialization case, the IRs are tiny, but extremely directional. Beyond a few feet, the IRs don’t change much with distance. We’ve been taking advantage of this to make another approximation of 3D audio that is independent of distance and that we call “far-field”. Our HRTF database is captured/sampled around the head as a grid on a sphere rather than a volume.
We’re spatializing along azimuth and elevation angles, but not distance.
Distance is addressed in a separate dedicated modeling:
- rolloff attenuation curves
- medium absorption filtering
- wet/dry balance
Near-field rendering begins with the acknowledgement that this model doesn’t work as well when sound distance from the listener shrinks to the point of being comparable to the size of the human head. In that case, spatialization and distance modeling become closely intertwined and are better synthesized from an ear-centric, rather than a head centric, spatial reference. In far-field, the center of the world is the center of our head. In near-field, the center of the world is the ear canal entrance, and we have two of them, which makes near-field even more “binaural” in some way than far-field.
The Near-field distance (radius of the Near-field sphere around the listener’s head) is commonly defined as ~0.5 - 1.0 m (~3 feet, “within arm's reach”). A logical evolution from our current far-field HRTF tech would be extending it to near-field by adding more filter samples to the database (red dots) to fill up the entire near-field sphere volume all the way to the head boundary:
This will likely come down the line from R&D, but will take more resources. In the meantime, just like for the reverberation and far-field spatialization cases, we're looking for a perceptual approximation that runs fast on hardware with limited resources.
So, what's special about near-field audio?
For our approximation to work, we first have to identify the main perceptual cues of near-field rendering:
- getting closer means louder with the inverse-square law in free field
- but loudness increase is mainly expressed as ILD (Interaural Level Difference) as the head interferes with propagation and a sound can be much closer to one ear than the other, thus generating much higher ILDs than far-field.
- increased head shadowing/diffraction:
- on the opposite (occluded) side: high frequencies are attenuated more than low frequencies
- overall, the effect is perceived as a subtle bass boost on top of the increased ILD.
- dryness of the signal: both early reflections and diffuse reverberation are strong perceptual cues of distance and must therefore be minimi
From this, the approximation will work better on the lateral sides (away from the median plane) where the ILDs and diffraction filters will be strongest, and we need full control over the reflected signal gains (early reflections and late reverberation).
Also worth noting is the absence of ITD (Interaural Time Difference) specific cues: being in close proximity does not affect the timing differences between each ear in a perceivable way, but does generate wider ITD and ILD variations than when they're moving similarly but farther away (remember that pesky mosquito!).
Near-field rendering model
az: azimuth angle
el: elevation angle
d: sound distance to the listener
a: head diameter
- The 1st step takes our far-field HRTF database (as usual), but re-interpreting it geometrically from each ear rather than from the head center.
- The next step is convolving our source signal as usual, but now with the near-field HRTF we just built. At this point, we’ve compensated for the directional error in our HRTF lookup so the spatialization is more accurate, but we're still sounding “far” because we're using a far-field HRTF.
- Finally, we apply in real-time the physical modeling of the head shadowing effect.
The key physical phenomenon at play here is acoustic diffraction: the bending of waves around rigid obstacles like the head.
This phenomenon is frequency dependent:
- low frequencies can bend around an obstacle
- high frequencies cannot
- the cutoff frequency depends on the size of the obstacle
It can be thought of as a binaural (each ear will get a different filtering effect) directional lowpass filter with a cutoff frequency directly related to the head size, the azimuth and elevation angles. Some of that filtering is already captured in our far field HRTFs (head diffraction is not restricted to near-field) so we're subtly accentuating the effect with a set of realtime filters parameterized in distance, azimuth and elevation.