API reference

Gpt2Tokenizer Class

GPT-2 style BPE tokenizer for text-only LLMs (SmolLM, Qwen, Phi, etc.) Supports byte-level encoding and special tokens for chat templates.
Provides both synchronous and asynchronous (background thread) tokenization modes.

Properties

PadTokenId : int
[Get]
Signature
int PadTokenId

Methods

Decode ( tokenIds )
Signature
string Decode(List< int > tokenIds)
Parameters
tokenIds: List< int >
Returns
string
EncodeAsync ( text , ct )
Asynchronously encodes text to token IDs on a background thread.
Use this to avoid blocking the main thread and prevent frame drops. May be slower overall, but maintains consistent frame rate.
Signature
async Task< List< int > > EncodeAsync(string text, CancellationToken ct=default)
Parameters
text: string
ct: CancellationToken
Returns
async Task< List< int > >
Initialize ( vocab , merges , config )
Signature
void Initialize(TextAsset vocab, TextAsset merges, TextAsset config)
Parameters
vocab: TextAsset
merges: TextAsset
config: TextAsset
Returns
void