Develop

Oculus Lipsync Guide

Updated: Apr 17, 2026

End-of-Life Notice for Oculus Lipsync Plugin

The Oculus Lipsync Plugin is in end-of-life stage and will not receive further updates or support. For audio-driven lip sync, the Movement SDK provides equivalent viseme functionality through audio-based face tracking, delivering the same 15 visemes via the XR_META_face_tracking_visemes OpenXR extension.

The OculusXRMovement module in the Movement SDK supports both visual-based face tracking (Meta Quest Pro) and audio-based face tracking (Meta Quest 2 and later). Audio-based face tracking generates visemes from audio input, providing a migration path from the Oculus Lipsync Plugin.

This documentation is no longer being updated and is subject to removal.

Oculus Lipsync describes a set of plugins and APIs that can be used to sync avatar lip movements to speech sounds and laughter. Lipsync analyzes the audio input stream from microphone input or an audio file and predicts a set of values called visemes, which are gestures or expressions of the lips and face that correspond to a particular speech sound. Visemes can be used to animate the lips of an avatar. With Lipsync, visemes can be precomputed to save CPU or generated in real time.

Animated Lipsync Example

The following animated image shows how you could use Lipsync to say “Welcome to the Oculus Lipsync demo.”

Laughter Detection

In Lipsync version 1.30.0 and newer, Lipsync offers support for laughter detection, which can help add more character and emotion to your avatars.

The following animation shows an example of laughter detection.

Visemes and Lipsync

A viseme describes a visual gesture or expression of the lips and face that corresponds to a particular speech sound, similar to how a phoneme describes a sound. The term viseme is used when discussing lip reading and is a basic visual unit of intelligibility. In computer animation, visemes may be used to animate avatars so that they look like they are speaking.

Lipsync uses a repertoire of visemes to modify avatars based on a specified audio input stream. Each viseme targets a specified geometry morph target in an avatar to influence the amount that target will be expressed on the model. Thus, with Lipsync we can generate realistic lip movement in sync with what is being spoken or heard. This enhances the visual cues that one can use when populating an application with avatars, whether the character is controlled by the user or is a non-playable character (NPC).

The Lipsync system maps to 15 separate viseme targets: sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, and ou. The visemes describe the face expression produced when uttering the corresponding speech sound. For example, the viseme sil corresponds to a silent/neutral expression, PP corresponds to pronouncing the first syllable in “popcorn” and FF the first syllable of “fish”. See the Viseme Reference Images for images that represent each viseme.

These 15 visemes have been selected to give the maximum range of lip movement, and are agnostic to language. For more information, see the Viseme MPEG-4 Standard⁠.

Topic Guide by Development Platform

As mentioned previously, Lipsync offers plugins for popular game engines and APIs for native development. The following table lists links to topics on installation and how to use Lipsync for Unity, Unreal or native C++ development.

Description	Topic
Overview of the tool, requirements, download and setup	Unreal Overview
Using Oculus Lipsync	Using the Unreal Lipsync package
Precompute visemes to improve performance	Guide to Precomputing Visemes for Unreal
Sample	Exploring Oculus Lipsync with the Unreal Sample