Gpt2Tokenizer Class
GPT-2 style BPE tokenizer for text-only LLMs (SmolLM, Qwen, Phi, etc.) Supports byte-level encoding and special tokens for chat templates.
Provides both synchronous and asynchronous (background thread) tokenization modes.
void Initialize ( TextAsset vocab,
TextAsset merges,
TextAsset config )
async Task< List< int > > EncodeAsync ( string text,
CancellationToken ct )
Asynchronously encodes text to token IDs on a background thread.
Use this to avoid blocking the main thread and prevent frame drops. May be slower overall, but maintains consistent frame rate.
string Decode ( List< int > tokenIds )