Develop

Select your platform

Orchestrator sample overview

Updated: May 12, 2026

Overview

This sample demonstrates a declarative resource management framework for Android systems, designed for system engineers working on performance optimization and resource control. The sample shows how to build an event-driven daemon using BPF programs for kernel-level monitoring, trait-based policy composition, and cgroupv2 enforcement mechanisms.

View the sample source code on GitHub⁠.

What you will learn

Use BPF programs to monitor cgroup lifecycle events, CPU execution time, and GPU utilization at the kernel level

Implement a trait-based policy system that composes per-process and aggregate resource management strategies

Build a multi-runtime architecture with tokio for supervised, long-running flows

Integrate with Android’s cgroupv2 infrastructure for memory pressure management and budget enforcement

Coordinate graceful shutdown across multiple concurrent async tasks using cancellation tokens

Requirements

Target device: Meta Quest device with developer mode enabled for ADB access and deployment

Development environment: Linux development environment with Android build tools

For complete build prerequisites and toolchain setup, see the sample README⁠.

Get started

The sample runs as an Android init service and requires Meta’s internal buck2 build system. Clone the orchestrator repository⁠ and build the orchestrator binary using buck2. Deploy to your device via the Android build system integration. For step-by-step build and deployment instructions, consult the sample README.

Explore the sample

The sample is organized into three resource domain flows (CPU, GPU, memory) with shared infrastructure in a common crate:

File / Crate	What it demonstrates	Key concepts
`service/src/bin.rs`	Main service entry point with multi-runtime supervision	Runtime creation, flow dispatch, signal handling, coordinated shutdown
`service/init/orchestrator.rc`	Android init service integration	System property triggers, capability management, lifecycle hooks
`bpf/cgroup.bpf.c`	BPF cgroup lifecycle monitoring	Ring buffer events, tracepoint hooks, cgroupv2 path filtering
`crates/cpuflow/bpf/cpu.bpf.c`	Per-CPU, per-PID execution tracking	Nested BPF maps, scheduler hooks, frequency-weighted credits
`crates/gpuflow/bpf/credits.bpf.c`	GPU power level and per-PID utilization tracking	Vendor tracepoint hooks, pinned maps, time-in-state accounting
`crates/common/src/lib.rs`	Core trait definitions	Flow trait, async reactor abstraction
`crates/common/src/process_monitor.rs`	Policy orchestration and lifecycle	`Policy`, `PolicyFactory`, `AggregatePolicy` traits, task supervision
`crates/common/src/cgroup_monitor.rs`	BPF event processing with state machine	Sequential event processing, async ring buffer polling, initial filesystem scan
`crates/common/src/cgroup.rs`	cgroupv2 filesystem abstraction	File descriptor caching, reclaim type control, async file I/O
`crates/common/src/stats.rs`	PSI and memory statistics parsing	Pressure triggers, cgroup stat formats, async pressure monitoring
`crates/memflow/src/lib.rs`	Fully implemented memory management flow	Policy composition, experiment configuration, PSI integration
`crates/memflow/src/policy/telemetry.rs`	Observability for memory usage	Per-process telemetry collection, statsd integration
`crates/memflow/src/policy/mem_throttle.rs`	Working set size measurement via throttle	`memory.high` manipulation, WSS estimation, cleanup guarantees
`crates/memflow/src/policy/system_psi.rs`	PI controller for pressure-based memory limits	Proportional-integral control, anti-windup, deadband filtering
`crates/memflow/src/policy/senpai_psi.rs`	Exponential backoff/probe pressure management	Pressure integral tracking, grace periods, power-state awareness
`crates/memflow/src/policy/static_min.rs`	Guaranteed memory protection for system services	Aggregate policy, `memory.min` enforcement, per-UID accumulation
`crates/memflow/src/policy/mem_budget.rs`	Declarative budget enforcement with remote config	JSON budget schema, enforcement levels (Trace/Throttle/Kill), dry-run mode
`crates/cpuflow/src/lib.rs`	Stub CPU flow (infrastructure only)	BPF data collection without userspace policy
`crates/gpuflow/src/lib.rs`	Stub GPU flow (infrastructure only)	BPF data collection without userspace policy

Runtime behavior

When the service starts, it creates four separate tokio runtimes: one single-threaded orchestrator runtime and three multi-threaded flow runtimes (CPU, GPU, memory). The orchestrator runtime spawns each flow on its dedicated runtime and enters a supervision loop that monitors Unix signals and flow health.

The memflow flow performs most of the work. On startup, it loads the cgroup monitoring BPF program, reads system pressure from /proc/pressure/memory, builds policy instances based on Android system property configuration, and starts the process monitor. The process monitor listens to a stream of cgroup lifecycle events from the BPF program, reads process information from /proc/{pid}/, and spawns individual policy tick tasks for each enrolled process.

The BPF programs run continuously in the kernel. The cgroup monitor emits events when processes attach to cgroups, execute, and exit. The CPU BPF program tracks execution time at scheduler context switches and frequency changes. The GPU BPF program tracks time-in-state at each power level when the GPU hardware changes frequency. All three BPF programs use ring buffers or pinned maps to communicate data to userspace.

When the service receives SIGTERM or SIGINT, the orchestrator cancels all flows and waits up to 190 milliseconds for them to exit gracefully. Each policy’s cleanup method restores cgroup files to default values before the task terminates.

Key concepts

Flow trait and multi-runtime architecture

The sample defines a Flow trait in crates/common/src/lib.rs that abstracts a long-running resource management domain. The main binary creates a separate tokio runtime for each flow, isolating scheduler contention and allowing per-flow thread naming for debugging:

#[async_trait]
pub trait Flow: Send {
    async fn run(&mut self, cancel_token: CancellationToken) -> anyhow::Result<()>;
}

The main supervision loop uses tokio::select! with biased to prioritize shutdown signals over flow completion. See service/src/bin.rs for the complete supervision pattern.

Policy system with trait-based composition

The policy system uses three traits to enable runtime composition. PolicyFactory conditionally creates per-process policies based on process metadata. Policy defines a periodic tick interface with cleanup guarantees. AggregatePolicy manages multiple processes collectively for cross-process resource decisions:

async fn create(&self, process: &ProcessInfo) -> anyhow::Result<Option<Box<dyn Policy>>>;

Factories return Option<Box<dyn Policy>>, allowing enrollment decisions at runtime. The ProcessMonitor in crates/common/src/process_monitor.rs iterates all factories for each new process and spawns tick tasks only for enrolled policies.

BPF ring buffer integration with state machine

The cgroup monitor uses BPF ring buffers for low-overhead event delivery from kernel to userspace. The sample wraps the ring buffer file descriptor with tokio::io::unix::AsyncFd for async polling:

let ring_buffer = AsyncFd::new(ring_buffer)?;
ring_buffer.readable().await?;

The state machine in crates/common/src/cgroup_monitor.rs tracks which processes are attached to cgroups, handling the Android process lifecycle where app processes get their identity at attach time but system services require exec events.

cgroupv2 abstraction with file descriptor caching

The CgroupEntry trait provides async methods for reading and writing cgroupv2 pseudo-files. The FileSystemCgroup implementation caches file descriptors and seeks to the start for repeated reads, avoiding open/close overhead on hot paths:

let stats = cgroup.memory_stats().await?;
cgroup.write_memory_high(limit_bytes).await?;

The ReclaimType parameter controls whether writes use O_NONBLOCK, determining whether memory reclaim is charged to the orchestrator or deferred to the target cgroup. See crates/common/src/cgroup.rs for the complete abstraction.

Pressure-based memory management

The sample demonstrates two memory pressure control algorithms. The System PSI policy implements a PI controller that adjusts memory.high to maintain a target pressure percentage. The Senpai PSI policy uses exponential backoff when pressure exceeds the target and exponential probing when pressure is below target.

Both policies read cumulative pressure from memory.pressure and compute the delta since the last tick. See crates/memflow/src/policy/system_psi.rs and crates/memflow/src/policy/senpai_psi.rs for the complete control logic.

Coordinated shutdown with cancellation tokens

The sample uses tokio_util::CancellationToken for hierarchical shutdown coordination. A global token created in main() is cloned and passed to each flow. The ProcessMonitor creates its own global token for all policy tasks:

select! {
    _ = cancel_token.cancelled() => break,
    _ = interval.tick() => { /* policy work */ }
}

Policies check for cancellation in their tick loops and call cleanup() before exiting. The orchestrator enforces a 190ms shutdown timeout, aborting tasks that fail to respond to cancellation.

Extend the sample

Add a new policy: Implement the PolicyFactory and Policy traits (see crates/memflow/src/policy/telemetry.rs for a minimal example), register the factory in crates/memflow/src/lib.rs, and add configuration system properties.

Implement userspace logic for CPU or GPU flows: Add policies that consume the BPF-collected data from the CPU and GPU programs.

Add support for new BPF tracepoints: Modify the cgroup monitor or create a new BPF program with skeleton generation.

Orchestrator GitHub repository⁠

Linux cgroupv2 documentation⁠

BPF and libbpf documentation⁠

Android init service configuration⁠

Tokio runtime documentation⁠