API reference

OpenAIProvider Class

OpenAI provider implementing chat (Responses API), speech-to-text, and text-to-speech.
Supports optional image inputs for multimodal models when SupportsVision is enabled.
Guides: https://platform.openai.com/docs/ Used by UI and samples via IChatTask, ISpeechToTextTask, and ITextToSpeechTask.

Protected Properties

DefaultSupportedTypes : override InferenceType
[Get]
Declares the default inference location supported by this provider: cloud.
Used by AIProviderBase to filter capability and route tasks. This provider does not advertise local/edge execution out of the box; if you need on-device models, use a provider that returns InferenceType.OnDevice, InferenceType.Cloud or InferenceType.LocalServer.
Signature
override InferenceType DefaultSupportedTypes

Properties

SupportsVision : bool
[Get]
Indicates whether this provider can handle vision inputs (images) alongside text during chat.
When true, ChatAsync will package ImageInput items using the OpenAI Responses format and, depending on settings, inline or resolve remote URLs. Toggle this for models like GPT-4o or any multimodal model that supports image understanding.
Controlled by the serialized supportsVision field and reported via IChatTask.SupportsVision. See also inlineRemoteImages and resolveRemoteRedirects to influence how remote images are prepared.
Signature
bool SupportsVision

Methods

ChatAsync ( req , stream , ct )
Sends a chat turn to OpenAI's Responses API and returns the assistant's text reply.
Validates apiKey and model, builds a single input message per the Responses schema, and POSTs to {apiRoot}/v1/responses. The method extracts output_text if present, or falls back to the first text item in the first output message.
See alsoAIProviderBaseIChatTask.
Signature
async Task< ChatResponse > ChatAsync(ChatRequest req, IProgress< ChatDelta > stream=null, CancellationToken ct=default)
Parameters
req: ChatRequest  The user message and optional ImageInput list. If SupportsVision is enabled, images are serialized as input_image items; remote images can be inlined or have redirects resolved depending on inlineRemoteImages and resolveRemoteRedirects.
stream: IProgress< ChatDelta >  Optional incremental callback for partial text via ChatDelta. This implementation reports the final text once per call (non-streaming HTTP). Use to update UI progressively.
ct: CancellationToken  Cancellation token for image preparation and HTTP. Cancels the request if the operation is aborted.
Returns
async Task< ChatResponse >  A ChatResponse containing the assistant text and the raw JSON payload for debugging or downstream parsing.
SynthesizeStreamCoroutine ( text , voice , onReady )
Coroutine that synthesizes speech with OpenAI audio/speech and yields a Unity AudioClip.
Selects AudioType based on ttsOutputFormat (e.g., WAV/MP3). Requires valid apiKey and model. POSTs to {apiRoot}/v1/audio/speech, then streams the response into a AudioClip via the internal HTTP helper.
See also ITextToSpeechTask.
Signature
IEnumerator SynthesizeStreamCoroutine(string text, string voice=null, Action< AudioClip > onReady=null)
Parameters
text: string  Input text to speak. Logs and exits if empty. Combined with ttsVoice and optional ttsInstructions to control style, plus ttsSpeed for playback rate.
voice: string  Optional voice name override. If null/empty, uses ttsVoice.
onReady: Action< AudioClip >  Callback invoked with the created AudioClip once download/decoding completes.
Returns
IEnumerator
TranscribeAsync ( audioBytes , language , ct )
Transcribes an audio clip using OpenAI audio/transcriptions and returns plain text.
Honors sttResponseFormat (e.g., json or text) and optional sttTemperature. Requires valid apiKey and model. POSTs to {apiRoot}/v1/audio/transcriptions.
See also ISpeechToTextTask.
Signature
async Task< string > TranscribeAsync(byte[] audioBytes, string language=null, CancellationToken ct=default)
Parameters
audioBytes: byte[]  Raw audio data (e.g., WAV). Throws if null or empty. The content is sent as multipart/form-data.
language: string  Optional ISO language override (for example, "en", "de"). If null/empty, falls back to sttLanguage or lets OpenAI auto-detect.
ct: CancellationToken  Cancellation token for the HTTP request.
Returns
async Task< string >  Transcript text. If sttResponseFormat is "text", the raw body is returned.
Throws
ArgumentException  Thrown when audioBytes is null or empty. InvalidOperationException  Thrown if apiKey or model is missing.