Develop

Viseme Reference

Updated: Apr 17, 2026

End-of-Life Notice for Oculus Lipsync Plugin

The Oculus Lipsync Plugin is in end-of-life stage and will not receive further updates or support. For audio-driven lip sync, the Movement SDK provides equivalent viseme functionality through audio-based face tracking, delivering the same 15 visemes via the XR_META_face_tracking_visemes OpenXR extension.

The OculusXRMovement module in the Movement SDK supports both visual-based face tracking (Meta Quest Pro) and audio-based face tracking (Meta Quest 2 and later). Audio-based face tracking generates visemes from audio input, providing a migration path from the Oculus Lipsync Plugin.

This documentation is no longer being updated and is subject to removal.

Oculus Lipsync maps human speech to a set of mouth shapes, called “visemes”, which are a visual analog to phonemes. Each viseme depicts the mouth shape for a specific set of phonemes. Over time these visemes are interpolated to simulate natural mouth motion. Below we give the reference images we used to create our own demo shapes. For each row we give the viseme name, example phonemes that map to that viseme, example words, and images showing both mild and emphasized production of that viseme. We hope that you will find these useful in creating your own models. For more information on these 15 visemes and how they were selected, please read the following documentation: Viseme MPEG-4 Standard⁠

Viseme Name	Phonemes	Examples	Emphasized Production
sil	neutral	(none - silence)	None
PP	p, b, m	put, bat, mat
FF	f, v	fat, vat
TH	th	think, that
DD	t, d	tip, doll
kk	k, g	call, gas
CH	tS, dZ, S	chair, join, she
SS	s, z	sir, zeal
nn	n, l	lot, not
RR	r	red
aa	A:	car
E	e	bed
I	ih	tip
O	oh	toe
U	ou	book

Viseme Reference

Animated example

Reference Images