API reference
API reference
Select your platform
No SDKs available
No versions available

OnDeviceLlmConfig Class

Configuration for on-device text-only LLM inference.
Contains model parameters, tokenizer, and chat template formatting.
IMPORTANT: Different models have different architecture parameters. You must configure these values to match your specific model, e.g.:
Qwen2.5-0.5B:
  • maxLayers: 24
  • numKeyValueHeads: 2
  • headDim: 64
  • eosTokenId: 151645
  • vocabSize: 151936
Create separate config assets for each model with appropriate parameters.

Fields

InferenceExecutionMode inferenceExecutionMode[Get]
int stepsPerFrame[Get]
TextAsset vocabFile[Get]
TextAsset mergesFile[Get]
TextAsset tokenizerConfigFile[Get]
ulong vocabFileContentId[Get]
ulong mergesFileContentId[Get]
ulong tokenizerConfigFileContentId[Get]
string chatTemplateFormat[Get]
string defaultSystemMessage[Get]
int maxLayers[Get]
int numKeyValueHeads[Get]
int headDim[Get]
int eosTokenId[Get]
int maxNewTokens[Get]
int maxPromptLength[Get]

Properties

int StepsPerFrame[Get]
Gpt2Tokenizer Tokenizer[Get]

Member Functions

void InitializeTokenizer ( )
string ApplyChatTemplate
( string userPrompt,
string systemMessage )