Develop

Whisperer sample overview

Updated: May 11, 2026

Overview

This sample demonstrates how to build natural voice interactions in VR using Meta Voice SDK’s Conduit framework, gesture-based activation, and per-object voice UI. Whisperer is a narrative puzzle game where you use hand gestures and voice commands to manipulate objects and progress through a story, showing patterns for gesture-driven voice gameplay.

What you will learn

  • Gesture-based voice activation using hand tracking and cone-cast object selection
  • Intent routing to specific objects using the Conduit framework’s [MatchIntent] attribute
  • Object-oriented voice interaction patterns with the abstract Listenable base class
  • TTS (text-to-speech) integration for NPC dialogue with animation synchronization
  • Per-object voice UI with real-time transcription feedback and status indicators

Requirements

  • Meta Quest 2, Quest 3, or Quest 3S
  • Unity development environment with Android build support
For detailed SDK versions, dependencies, and platform setup, see the sample README and the development environment setup guide.

Get started

Clone the repository from GitHub, open the project in Unity, and build to your Quest device. The sample uses additive scene loading with a persistent Loader scene that manages the Voice SDK lifecycle. For detailed build instructions, dependencies, and Wit.ai configuration, see the sample README.

Explore the sample

The project uses a two-layer architecture: a persistent Loader scene containing the Voice SDK and player rig as well as additively-loaded level scenes with interactive objects:
File / SceneWhat it demonstratesKey concepts
Loader.unity
Persistent voice infrastructure
AppVoiceExperience, SpeakGestureWatcher, LevelLoader singleton
Level_1.unity + BaseLayout_Day.unity
Tutorial level with progressive object enablement
Level_1_Manager, coroutine-driven narrative, highlight-only mode
Level_2.unity + BaseLayout_Day.unity
Multi-puzzle voice interactions
Radio, treasure chest, water faucet, HeroPlant movement
Level_3.unity + BaseLayout_Night.unity
Advanced entity extraction and state
Drawer puzzles, color selection via enum entities, PlayerPrefs persistence
Scripts/Voice/Listenable.cs
Abstract base class for voice-interactive objects
Event subscription, shimmer effect, timeout handling
Listenable Objects/ForceMovable.cs
Physics-based voice commands
Manual entity extraction, force application from direction entities
Listenable Objects/HaroldTheBird.cs
NPC with TTS responses
Multiple intents, TTSSpeaker.SpeakQueued(), animation sync
Scripts/Voice/SpeakGestureWatcher.cs
Gesture-based mic activation
Hand position tracking, cone-cast object selection, AppVoiceExperience lifecycle
Scripts/Voice/VoiceUI.cs
Per-object voice feedback UI
Transcription display, status icons, mic level visualization

Runtime behavior

When you run Level 1, you see a narrative intro and then the tutorial prompts you to raise both hands near your face to activate the speak gesture. A cone extends from your head showing the object you are targeting. When the pose activates, a floating UI appears above the object showing mic levels and partial transcription. You can say “move left” or “jump” to control a plant pot using physics forces. Harold the bird wakes up and responds to voice commands like “pretty bird” with TTS dialogue. As you progress, more objects become voice-interactive, and each puzzle completion triggers the next narrative beat.

Key concepts

Gesture-based activation

The sample uses hand tracking to detect a “speak pose” where both hands are near the face. SpeakGestureWatcher measures distance from both hand transforms to a reference point, then cone-casts from the headset to find Listenable objects in view:
var havePose = leftDist < _handsDistThresh && rightDist < _handsDistThresh;
if (!_allowSpeak || !_hands.SpeakHandsReady) havePose = false;
When a target is found, AppVoiceExperience.Activate() starts the microphone. See SpeakGestureWatcher.cs for the full implementation.

Conduit intent routing with MatchIntent

The sample decorates methods with [MatchIntent] attributes to auto-route intents to handler functions. Conduit supports zero-parameter handlers, WitResponseNode injection (raw response data from Wit.ai), and auto-mapped enum parameters:
[MatchIntent("move")]
public void Move(ForceDirection direction, WitResponseNode node)
{
    ForceMove(direction, node);
}
The ForceDirection enum is automatically populated from Wit.ai entities. See HeroPlant.cs for enum usage and ForceMovable.cs for manual entity extraction patterns.

Object-oriented voice interaction pattern

All voice-interactive objects extend the abstract Listenable class, which manages Voice SDK event subscription, response handling, and visual feedback:
_appVoiceExperience?.VoiceEvents.OnResponse.AddListener(HandleResponse);
_appVoiceExperience?.VoiceEvents.OnError.AddListener(HandleWitFailOnError);
Subclasses include ForceMovable (physics commands), Openable (open/close), Radio (on/off/station), and HaroldTheBird (NPC dialogue). See Listenable.cs for the full hierarchy.

TTS integration for NPC dialogue

Harold the bird uses TTSSpeaker to speak responses. Animation events sync with TTS playback:
var whatToSay = witResponse.GetFirstEntityValue("something:something");
_speaker.SpeakQueued(whatToSay);
OnTextPlaybackStart triggers the “Talk” animation, and OnTextPlaybackStop returns to “Idle_Alt”. See HaroldTheBird.cs for the complete pattern.

Per-object voice UI

Each Listenable gets a VoiceUI instance showing real-time transcription, mic levels, and status icons (listening/thinking/success/fail/error). The UI is instantiated by LevelManager.Start() and positioned above the object. See VoiceUI.cs for UI state management.

Extend the sample

  • Add a new voice-interactive object type: Create a class extending Listenable, decorate methods with [MatchIntent], and add corresponding intents to your Wit.ai app. Reference Radio.cs for simple on/off patterns or ForceMovable.cs for entity extraction.
  • Build a custom narrative level: Create a LevelManager subclass with coroutine-driven narrative beats. Register your scenes in LevelLoader and use progressive object enablement to gate progression.
  • Implement multi-entity commands: Extend your Wit.ai app with compound entities and create Conduit methods accepting multiple enum parameters.