Design

Audio

Updated: Oct 7, 2025
Audio plays a crucial role in crafting immersive, engaging, and accessible experiences. The ability to accurately localize audio sources in 3D space is essential for establishing presence in your applications. Meta Horizon supports various ways to play sound, including world-locked, direction-locked, head-locked and more, across various development options. Whether you are developing immersive experiences with Unity or Unreal Engine, building Worlds with Meta Horizon Worlds desktop editor, or designing 2D apps with Meta Spatial SDK, this page will enhance your understanding of audio design principles in immersive experiences.
By the end of this guide, you can expect to learn how to effectively utilize head-tracked spatial audio, understand the benefits and challenges of this approach, and gain practical insights into designing engaging and accessible audio experiences.
Meta Quest user wearing a headset with audio sources.

Meta Quest user wearing a headset with visualizations of spatial audio sources.

Usage

Developers and designers can use sound to evoke emotions and provide deeper meaning or context to visuals and interactions. Effective sound design provides critical feedback to users on their interactions as well as their location, essential for spatial experiences. By leveraging our natural ability to perceive sound in 3D space, spatial cues, such as distance, direction, and location, create a sense of presence, drawing users into the experience and making them feel like they are part of the environment. It can also expand our awareness beyond our field-of-view and lighten our cognitive load when multitasking. Audio features, such as dynamic mixing, acoustic propagation, and real-time audio processing can work together with sound design and spatial audio to create a seamless and cohesive audio user experience (UX), leading to a more impactful and lasting experience. This documentation covers the technology used to implement audio UX, outlines key design principles, and offers guidance on designing & using sound for spatial experiences.
  • Immersive Experiences: Sound design and audio UX play a crucial role in creating immersive and engaging experiences in passthrough and fully immersive applications, enhancing the sense of presence and realism.
  • Improved Engagement: Well-designed sound and thoughtful audio UX can significantly enhance the sense of immersion and engagement in various applications, making them more enjoyable and memorable.
  • Increased Accessibility: Thoughtful sound design and audio UX can improve accessibility for users with disabilities, providing alternative ways to interact with and experience content.
  • Enhanced Emotional Connection: Effective sound design can create a deeper emotional connection between users and the application, game, or story, leading to a more impactful and lasting experience.

Terminology

These are the different parts, characteristics and frequently used terminologies that you should be familiar with:
TermDefinition
Anechoic
Producing no echoes; very low or no reverberation.
Ambisonic
A recording and synthesis technique and related technology used to play back audio in a way that is speaker-independent and simulates the way humans hear sound from a fixed position in space. Meta Quest uses AmbiX B-format ambisonics with four channels ordered WYZX where W represents an omni channel and YZX are representations of the three axes.
Attenuation
A loss of energy; in acoustics, typically a reduction in volume.
Audio feedback
A system that uses sound or music to provide feedback to the user about their actions or interactions or to notify the user of events or updates. See also Haptic feedback .
Audio mixing
The process of combining multiple audio signals into a single signal, using techniques such as volume attenuation. When this process is controlled in real time by script or code, it is called Dynamic Mixing.
Data compression
A technique used to reduce the size of audio files. Contrast this with Dynamic compression below.
Direct sound
Sound that has traveled directly to the listener. Contrast this with Reverberant sound below.
Dynamic compression
A technique used to adjust the volume of audio signals in real-time, reducing loud sounds and amplifying quiet ones, to create a more consistent and balanced listening experience. Contrast this with Data compression above.
Early reflections
Sounds that bounce off a surface such as a nearby wall and reach the listener before Late reflections.
Head-Related Transfer Function (HRTF)
A mathematical model that describes how sound waves interact with the human head and ears, used to create more realistic spatial audio simulations.
Late reflections
Sounds that bounce off multiple surfaces such as nearby walls and reach the listener after Early reflections.
Object-based audio
A technique used to create more realistic audio experiences by modeling the behavior of individual objects in a virtual environment, allowing for accurate sound localization and spatialization in 3D space.
Reverberant sound
Sound that has reflected or reverberated before arriving at a listener’s location. Contrast this with Direct sound above.
Reverberation
The reflection of sound off of a surface, or the temporary persistence of sound in a space caused by reflecting off multiple surfaces.
Sound localization
The process of determining the location of a sound’s origin or the suggestion of an object’s location based on the manipulation of auditory cues.
Spatial audio
Recreates how humans hear sound in three-dimensional space, using binaural rendering to simulate the way sound waves interact with our ears and head. This results in a more immersive and plausible user experience, especially when combined with visual and haptic elements.

Technology and how it works

This section provides insights into spatial audio and acoustic simulation technologies. These advanced audio technologies are designed to deliver an optimal audio user experience and cater to diverse user needs and abilities. Exploring the details of how these technologies function—with a special focus on localization, acoustic modeling, and their limitations—can help you understand the measures you can take to mitigate these challenges.

Spatial audio

An essential element of immersive audio is spatialization: the ability to play a sound as if it is positioned at any specific point in three-dimensional space.
Tools like the Meta XR Audio SDK help place sounds naturally in the space around the user, allowing them to perceive audio coming from different directions using complex filters called head-related transfer functions (HRTFs). HRTF-based object and ambisonic spatialization modifies sounds in real time to make them localizable, so they seem to come from distinct positions within the environment.
Example two people using HRTFs

The sounds we experience are directly impacted by the shape and geometry of our body (especially our ear), as well as the direction of the incoming sound. These two elements: our body + the direction of the audio source, form the basis of HRTFs which are acoustic filters used to spatialize sound.

Room acoustics

In addition to spatialization, acoustic simulations enhance realism, by modeling how sound waves interact with the environment, taking into account factors such as geometry, materials, and reflections. This allows developers to create more immersive audio experiences by simulating how sounds behave in different spaces, such as echoing in a large hall or being absorbed by soft furnishings. By integrating these simulations, developers can create audio environments that are not only spatially accurate but also acoustically rich, further enhancing the user’s sense of immersion and presence.
Example of sounds being experienced indirectly.

The sounds we experience are also indirectly impacted by the shape and geometry of our environment, as well as our distance from the incoming sound. These two elements, our environment and the distance of the audio source, form the basis of environment and distance modeling which are cues we use to understand the space around us.

Head-Tracking

Immersive headsets such as the Meta Quest use head-tracking technology to monitor the orientation and position of a user’s head. This technology is a crucial component in creating high-quality audio user experiences, particularly in the context of spatial audio and acoustic simulations.
Head-tracking relies on sensors or cameras to detect head movements. This data allows the audio system to adjust spatialized sounds in real-time, anchoring it within the virtual environment as the user moves. By accurately capturing head motion, head-tracking technology enables the creation of immersive and realistic audio environments that mimic the way sound behaves in the real world.

Limitations & mitigation

Every technology comes with its own set of limitations and challenges that must be addressed in order to ensure optimal performance and usability. This section will delve into these aspects, discussing mitigation strategies through design or code.

Spatial audio

Spatial audio brings an unlimited auditory field to immersive experiences, allowing users to perceive sound from any direction (including behind, above, or below them), distance, or location, but requires sufficient CPU and low enough latency to ensure that the audio rendering is synchronized with the user’s head movement. Product-wide performance optimizations have a cascading effect that ensure sufficient processing power for low latency and accurate spatialization.
User experience with spatial audio.

With spatial audio

User experience without spatial audio.

Without spatial audio

Acoustic simulation

Acoustic propagation and room acoustics help create plausible sounding environments in spatial computing, making it feel like you’re really there by simulating the way sound bounces off walls, floors, and ceilings. Users may expect the simulated environment to behave in ways that defy physical laws or contradict their everyday experiences if the focus on creating visually stunning environments leads to neglect of the acoustic aspects or spaces which are acoustically challenged. This can result in an unbalanced and unrealistic overall experience. Matching those visual scenes can introduce unwanted artifacts, such as echoes, ringing, or other distortions.
Simulation of audio outputs.

Example of direct and indirect audio in an acoustic simulation.

Designing the environment to take into account the limitations of acoustic simulation technology can help improve the overall audio experience. Meta offers two different acoustic simulation technologies: Shoebox Reverb (efficient but less accurate) and Acoustic Ray Tracing (ART) (computationally more intensive but more plausible results).

Head-Tracking

Head-tracking enables a personalized audio experience in spatial audio, tailoring the sound output to each individual user’s unique head movements and focus, but overemphasizing visual design elements within the user’s field of view (FOV) can lead to neglect of audio’s unlimited field of audition, resulting in an unbalanced and less immersive experience. Designing a more spatial experience can help improve the overall user experience.
Audio without head tracking consideration.

Audio without head tracking consideration.

Audio with head tracking consideration.

Audio with head tracking consideration.

Head-tracking also enables accurate sound localization in spatial audio, allowing users to pinpoint the source of sounds in 3D space by adjusting the sound output in real-time to match the user’s head movements. Creating a more engaging and interactive experience. Neglecting spatial audio’s continuous presence during visual optimization can cause jarring audio-visual disconnects, disrupting the user experience. Ensuring spatial audio is considered when defining rendering optimizations prevents disruptions and enriches the overall user experience.

Audio rendering

High quality audio rendering enables clear and distinct audio, allowing users to easily distinguish between different sounds and enjoy an immersive experience, but sounds with broad band frequency content can make it difficult to hear details in other sounds. Overlapping sounds can make it difficult to separate the contents, which can be tiring to users.
Audio with head tracking consideration.

Overlapping audio sources without spatial separation can mask each other, making sounds hard to distinguish.

Designing sounds within distinct audio frequency ranges and leveraging spatial audio reduces cognitive load by creating separation between overlapping sounds and unlocking multitasking.

Design

This section will guide you on utilizing sound for interactions and across the user experience and help you familiarize yourself with the primitives, understand the design principles, and discover the key dos and don’ts.

Audio UX primitives

These core primitives ensure that experiences are both technically robust and emotionally resonant, and are rooted in a deep understanding of both the technical and perceptual aspects of audio. By combining these elements with a creative and inclusive mindset, it’s possible to create audio experiences that are not only technically sound but also emotionally impactful.

Perceptual primitives

Understanding how we perceive sound is essential for creating great audio experiences. By understanding these subtleties, we can craft engaging and informative audio experiences that draw users in. You can shape sound by:
  • Separating and layering sounds within distinct frequency ranges to create clarity.
  • Making changes in amplitude to direct attention.
  • Using character, tone, and rhythm for rich and memorable sounds.
Additionally, being able to locate sounds in space helps us understand our surroundings better, making the experience immersive and improving spatial awareness.

Technical primitives

Several technical methods help create immersive and plausible audio experiences. Key techniques include:
  • Spatial audio to imitate how sound behaves in real life.
  • Find the right balance between sounds as they play with dynamic mixing.
  • Add depth with acoustic effects like echoes and reflections.
Techniques like equalization, compression, limiting, and noise reduction further fine-tune the audio to make it clear and engaging, enhancing the sense of presence.

Creative primitives

Creating audio for user engagement includes:
  • Crafting purpose-build sounds like ambient sounds, user interface feedback, and dynamic sound effects.
  • Using music to set the mood and control how the user feels.
  • Creating a unique sound identity with sound logos for brand recognition.
  • Adding speech, narration, and voice assistants makes interactions feel personal and easy.
  • Supporting accessibility by offering cues and alternatives for those with different needs.
Timing can highlight important information, guiding user experience with rhythm and sound changes for a more engaging experience.

Design principles

Here are fundamental concepts that shape user-friendly audio user experiences.
PrincipleDefinition
Cohesive
Ensure all audio elements share the same space as the visual elements to create a unified sense of presence and immersion.
Empowering
Provide users with control and customization options for their audio experience, enhancing comfort and agency.
Immersive
Spatial by default, create audio user experiences that engage users deeply and ground them to virtual experiences.
Connected
Support inclusive, shared acoustic spaces to foster social presence and connection among all users, regardless of ability.
Effortless
Reduce cognitive load with dynamic, reactive audio that mimics real-world hearing, enabling seamless multitasking.
Unified
Harmonize interactions, visuals, and sound to create a cohesive cross-modal experience that fosters connection and presence.
Contextually Relevant
Provide users with relevant audio cues and information that are contextually aligned with their current position within the experience, enhancing their understanding and navigation.

Best practices

Audio UX is a pivotal component of user experience design, strategically utilizing sound to craft immersive, engaging, and accessible experiences. By integrating sound design with spatial audio and leveraging our natural ability to perceive sound in 3D space, developers can evoke emotions, provide context, and enhance user interactions.
Speaker is identified
DO use spatial audio to create an immersive experience that simulates how humans naturally hear and localize sounds in space, making it easier for users to pinpoint the source of sounds.
Speaker is not identified
DO design experiences where visual, audio, and interaction elements complement and enhance each other, creating a cohesive and immersive co-modal spatial experience that engages users' senses and fosters a deeper connection to the environment.
Speaker is identified
DON'T forget to consider users comfort when designing audio experiences, as loud or jarring sounds can cause discomfort or even pain.
By understanding and applying these concepts, designers can create audio experiences that not only captivate users but also foster a deeper connection to the digital world, ensuring a more impactful and lasting engagement.

Get started with spatial audio technology

Our tools and integration work seamlessly with all major game engines and audio middleware solutions. Below are a few relevant documents to help you get started:
TopicDescription
This is the Meta XR Audio SDK for Unity documentation and resources page. Here you will find everything needed to integrate spatial audio into Unity projects.
This guide describes guidelines and resources for creating a compelling spatial audio experience in Unity using FMOD as the audio engine. It will walk you through setting up a simple FMOD project using the Meta Audio SDK plug-in for FMOD and integrating that FMOD project into your immersive experience.
This guide describes guidelines and resources for creating a compelling spatial audio experience in Unity using Audiokinetic Wwise as the audio engine. It will walk you through setting up a simple Wwise project using the Meta Audio SDK plug-in for Wwise and integrating that Wwise project into your immersive experience.
This is the Meta XR Audio SDK for Unreal Engine documentation and resources page. Here you will find everything needed to integrate spatial audio into Unreal Engine projects.
This guide describes guidelines and resources for creating a compelling spatial audio experience in Unreal Engine using FMOD as the audio engine. It will walk you through setting up a simple FMOD project using the Meta Audio SDK plug-in for FMOD and integrating that FMOD project into your immersive experience.
This guide describes guidelines and resources for creating a compelling spatial audio experience in Unreal Engine using Audiokinetic Wwise as the audio engine. It will walk you through setting up a simple Wwise project using the Meta Audio SDK plug-in for Wwise and integrating that Wwise project into your immersive experience.
This overview will help you get started understanding spatial audio in Meta Spatial SDK.

Informative video and articles on audio for immersive experiences

Sound design and mixing is an art form, and immersive experiences are a new medium. Whether you’re an aspiring sound designer or a seasoned veteran, creating music and sound for immersive experiences will present new challenges and subvert some of the conventional wisdom in sound design and mixing for games and traditional media.
Introductory video presentations covering the core concepts of audio for immersive experiences:
Presentations and articles that share the latest tools and technologies that drive virtual reality audio:

Learn and discuss audio in the Meta Horizon developer forums

If you’re interested in learning more about Meta Horizon audio or just want to chat with other audio-minded developers, drop by our Developer Forums.
Did you find this page helpful?
Thumbs up icon
Thumbs down icon