Configuration for on-device text-only LLM inference.
Contains model parameters, tokenizer, and chat template formatting.
IMPORTANT: Different models have different architecture parameters. You must configure these values to match your specific model, e.g.:
Qwen2.5-0.5B:
maxLayers: 24
numKeyValueHeads: 2
headDim: 64
eosTokenId: 151645
vocabSize: 151936
Create separate config assets for each model with appropriate parameters.