API reference
API reference
Select your platform
No SDKs available
No versions available

Gpt2Tokenizer Class

GPT-2 style BPE tokenizer for text-only LLMs (SmolLM, Qwen, Phi, etc.) Supports byte-level encoding and special tokens for chat templates.
Provides both synchronous and asynchronous (background thread) tokenization modes.

Properties

int EosTokenId[Get]
int PadTokenId[Get]
int UnkTokenId[Get]

Member Functions

void Initialize
( TextAsset vocab,
TextAsset merges,
TextAsset config )
Asynchronously encodes text to token IDs on a background thread.
Use this to avoid blocking the main thread and prevent frame drops. May be slower overall, but maintains consistent frame rate.
string Decode
( List< int > tokenIds )