API reference
API reference
Select your platform
No SDKs available
No versions available

OpenAIProvider Class

OpenAI provider implementing chat (Responses API), speech-to-text, and text-to-speech.
Supports optional image inputs for multimodal models when SupportsVision is enabled.
Guides: https://platform.openai.com/docs/ Used by UI and samples via IChatTask, ISpeechToTextTask, and ITextToSpeechTask.

Properties

Unique identifier for this provider type (e.g., "OpenAI", "LlamaApi").
Used to store and retrieve credentials from the central CredentialStorage.
When true, this provider asset uses its own API key instead of the central storage.
bool IChatTask. SupportsVision[Get]
Indicates whether this provider can handle vision inputs (images) alongside text during chat.
When true, ChatAsync will package ImageInput items using the OpenAI Responses format and, depending on settings, inline or resolve remote URLs. Toggle this for models like GPT-4o or any multimodal model that supports image understanding.Controlled by the serialized supportsVision field and reported via IChatTask.SupportsVision. See also inlineRemoteImages and resolveRemoteRedirects to influence how remote images are prepared.
Declares the default inference location supported by this provider: cloud.
Used by AIProviderBase to filter capability and route tasks. This provider does not advertise local/edge execution out of the box; if you need on-device models, use a provider that returns InferenceType.OnDevice, InferenceType.Cloud or InferenceType.LocalServer.

Member Functions

Sends a chat turn to OpenAI's Responses API and returns the assistant's text reply.
Validates apiKey and model, builds a single input message per the Responses schema, and POSTs to {apiRoot}/v1/responses. The method extracts output_text if present, or falls back to the first text item in the first output message. OpenAI Responses docs: https://platform.openai.com/docs/guides/responses/ See alsoAIProviderBaseIChatTask.
Parameters
req
The user message and optional ImageInput list. If SupportsVision is enabled, images are serialized as input_image items; remote images can be inlined or have redirects resolved depending on inlineRemoteImages and resolveRemoteRedirects.
stream
Optional incremental callback for partial text via ChatDelta. This implementation reports the final text once per call (non-streaming HTTP). Use to update UI progressively.
ct
Cancellation token for image preparation and HTTP. Cancels the request if the operation is aborted.
Returns
A ChatResponse containing the assistant text and the raw JSON payload for debugging or downstream parsing.
Transcribes an audio clip using OpenAI audio/transcriptions and returns plain text.
Honors sttResponseFormat (e.g., json or text) and optional sttTemperature. Requires valid apiKey and model. POSTs to {apiRoot}/v1/audio/transcriptions. OpenAI Transcriptions: https://platform.openai.com/docs/guides/speech-to-text See also ISpeechToTextTask.
Parameters
audioBytes
Raw audio data (e.g., WAV). Throws if null or empty. The content is sent as multipart/form-data.
language
Optional ISO language override (for example, "en", "de"). If null/empty, falls back to sttLanguage or lets OpenAI auto-detect.
ct
Cancellation token for the HTTP request.
Returns
Transcript text. If sttResponseFormat is "text", the raw body is returned.
Exceptions
ArgumentException
Thrown when audioBytes is null or empty.
InvalidOperationException
Thrown if apiKey or model is missing.
Coroutine that synthesizes speech with OpenAI audio/speech and yields a Unity AudioClip.
Selects AudioType based on ttsOutputFormat (e.g., WAV/MP3). Requires valid apiKey and model. POSTs to {apiRoot}/v1/audio/speech, then streams the response into a AudioClip via the internal HTTP helper. OpenAI TTS: https://platform.openai.com/docs/guides/text-to-speech See also ITextToSpeechTask.
Parameters
text
Input text to speak. Logs and exits if empty. Combined with ttsVoice and optional ttsInstructions to control style, plus ttsSpeed for playback rate.
voice
Optional voice name override. If null/empty, uses ttsVoice.
onReady
Callback invoked with the created AudioClip once download/decoding completes.