Whisperer sample overview

Updated: May 11, 2026

Overview

This sample demonstrates how to build natural voice interactions in VR using Meta Voice SDK’s Conduit framework, gesture-based activation, and per-object voice UI. Whisperer is a narrative puzzle game where you use hand gestures and voice commands to manipulate objects and progress through a story, showing patterns for gesture-driven voice gameplay.

View the sample source code on GitHub⁠.

What you will learn

Gesture-based voice activation using hand tracking and cone-cast object selection

Intent routing to specific objects using the Conduit framework’s [MatchIntent] attribute

Object-oriented voice interaction patterns with the abstract Listenable base class

TTS (text-to-speech) integration for NPC dialogue with animation synchronization

Per-object voice UI with real-time transcription feedback and status indicators

Requirements

Meta Quest 2, Quest 3, or Quest 3S

Unity development environment with Android build support

For detailed SDK versions, dependencies, and platform setup, see the sample README⁠ and the development environment setup guide.

Get started

Clone the repository⁠ from GitHub, open the project in Unity, and build to your Quest device. The sample uses additive scene loading with a persistent Loader scene that manages the Voice SDK lifecycle. For detailed build instructions, dependencies, and Wit.ai configuration, see the sample README.

Explore the sample

The project uses a two-layer architecture: a persistent Loader scene containing the Voice SDK and player rig as well as additively-loaded level scenes with interactive objects:

File / Scene	What it demonstrates	Key concepts
`Loader.unity`	Persistent voice infrastructure	`AppVoiceExperience`, `SpeakGestureWatcher`, `LevelLoader` singleton
`Level_1.unity` + `BaseLayout_Day.unity`	Tutorial level with progressive object enablement	`Level_1_Manager`, coroutine-driven narrative, highlight-only mode
`Level_2.unity` + `BaseLayout_Day.unity`	Multi-puzzle voice interactions	Radio, treasure chest, water faucet, HeroPlant movement
`Level_3.unity` + `BaseLayout_Night.unity`	Advanced entity extraction and state	Drawer puzzles, color selection via enum entities, `PlayerPrefs` persistence
`Scripts/Voice/Listenable.cs`	Abstract base class for voice-interactive objects	Event subscription, shimmer effect, timeout handling
`Listenable Objects/ForceMovable.cs`	Physics-based voice commands	Manual entity extraction, force application from direction entities
`Listenable Objects/HaroldTheBird.cs`	NPC with TTS responses	Multiple intents, `TTSSpeaker.SpeakQueued()`, animation sync
`Scripts/Voice/SpeakGestureWatcher.cs`	Gesture-based mic activation	Hand position tracking, cone-cast object selection, `AppVoiceExperience` lifecycle
`Scripts/Voice/VoiceUI.cs`	Per-object voice feedback UI	Transcription display, status icons, mic level visualization

Runtime behavior

When you run Level 1, you see a narrative intro and then the tutorial prompts you to raise both hands near your face to activate the speak gesture. A cone extends from your head showing the object you are targeting. When the pose activates, a floating UI appears above the object showing mic levels and partial transcription. You can say “move left” or “jump” to control a plant pot using physics forces. Harold the bird wakes up and responds to voice commands like “pretty bird” with TTS dialogue. As you progress, more objects become voice-interactive, and each puzzle completion triggers the next narrative beat.

Key concepts

Gesture-based activation

The sample uses hand tracking to detect a “speak pose” where both hands are near the face. SpeakGestureWatcher measures distance from both hand transforms to a reference point, then cone-casts from the headset to find Listenable objects in view:

var havePose = leftDist < _handsDistThresh && rightDist < _handsDistThresh;
if (!_allowSpeak || !_hands.SpeakHandsReady) havePose = false;

When a target is found, AppVoiceExperience.Activate() starts the microphone. See SpeakGestureWatcher.cs for the full implementation.

Conduit intent routing with MatchIntent

The sample decorates methods with [MatchIntent] attributes to auto-route intents to handler functions. Conduit supports zero-parameter handlers, WitResponseNode injection (raw response data from Wit.ai), and auto-mapped enum parameters:

[MatchIntent("move")]
public void Move(ForceDirection direction, WitResponseNode node)
{
    ForceMove(direction, node);
}

The ForceDirection enum is automatically populated from Wit.ai entities. See HeroPlant.cs for enum usage and ForceMovable.cs for manual entity extraction patterns.

Object-oriented voice interaction pattern

All voice-interactive objects extend the abstract Listenable class, which manages Voice SDK event subscription, response handling, and visual feedback:

_appVoiceExperience?.VoiceEvents.OnResponse.AddListener(HandleResponse);
_appVoiceExperience?.VoiceEvents.OnError.AddListener(HandleWitFailOnError);

Subclasses include ForceMovable (physics commands), Openable (open/close), Radio (on/off/station), and HaroldTheBird (NPC dialogue). See Listenable.cs for the full hierarchy.

TTS integration for NPC dialogue

Harold the bird uses TTSSpeaker to speak responses. Animation events sync with TTS playback:

var whatToSay = witResponse.GetFirstEntityValue("something:something");
_speaker.SpeakQueued(whatToSay);

OnTextPlaybackStart triggers the “Talk” animation, and OnTextPlaybackStop returns to “Idle_Alt”. See HaroldTheBird.cs for the complete pattern.

Per-object voice UI

Each Listenable gets a VoiceUI instance showing real-time transcription, mic levels, and status icons (listening/thinking/success/fail/error). The UI is instantiated by LevelManager.Start() and positioned above the object. See VoiceUI.cs for UI state management.

Extend the sample

Add a new voice-interactive object type: Create a class extending Listenable, decorate methods with [MatchIntent], and add corresponding intents to your Wit.ai app. Reference Radio.cs for simple on/off patterns or ForceMovable.cs for entity extraction.

Build a custom narrative level: Create a LevelManager subclass with coroutine-driven narrative beats. Register your scenes in LevelLoader and use progressive object enablement to gate progression.

Implement multi-entity commands: Extend your Wit.ai app with compound entities and create Conduit methods accepting multiple enum parameters.

Voice SDK overview

Conduit framework documentation

Wit.ai documentation⁠

Whisperer GitHub repository⁠