RealtimeResponseCreateParams
objectCreate a new Realtime response with these parameters
The set of modalities the model can respond with. To disable audio,
set this to [“text”].
Allowed values:textaudio
The default system instructions (i.e. system message) prepended to model
calls. This field allows the client to guide the model on desired
responses. The model can be instructed on response content and format,
(e.g. “be extremely succinct”, “act friendly”, “here are examples of good
responses”) and on audio behavior (e.g. “talk quickly”, “inject emotion
into your voice”, “laugh frequently”). The instructions are not guaranteed
to be followed by the model, but they provide guidance to the model on the
desired behavior.
Note that the server sets default instructions which will be used if this
field is not set and are visible in the session.created event at the
start of the session.
The voice the model uses to respond. Voice cannot be changed during the
session once the model has responded with audio at least once. Current
voice options are alloy, ash, ballad, coral, echo sage,
shimmer and verse.
Allowed values:alloyashballadcoralechosageshimmerverse
The format of output audio. Options are pcm16, g711_ulaw, or g711_alaw.
Allowed values:pcm16g711_ulawg711_alaw
Tools (functions) available to the model.
Show Child Parameters
How the model chooses tools. Options are auto, none, required, or
specify a function, like {"type": "function", "function": {"name": "my_function"}}.
Sampling temperature for the model, limited to [0.6, 1.2]. Defaults to 0.8.
One OfMaximum number of output tokens for a single assistant response,
inclusive of tool calls. Provide an integer between 1 and 4096 to
limit output tokens, or inf for the maximum available tokens for a
given model. Defaults to inf.
One OfControls which conversation the response is added to. Currently supports
auto and none, with auto as the default value. The auto value
means that the contents of the response will be added to the default
conversation. Set this to none to create an out-of-band response which
will not add items to default conversation.
Set of 16 key-value pairs that can be attached to an object. This can be
useful for storing additional information about the object in a structured
format, and querying for objects via API or the dashboard.
Keys are strings with a maximum length of 64 characters. Values are strings
with a maximum length of 512 characters.
The item to add to the conversation.
Show Child Parameters
RealtimeServerEventConversationCreated
objectReturned when a conversation is created. Emitted right after session creation.
The unique ID of the server event.
The event type, must be conversation.created.
Allowed values:conversation.created
The conversation resource.
Show Child Parameters
RealtimeServerEventConversationItemCreated
objectReturned when a conversation item is created. There are several scenarios that
produce this event:
- The server is generating a Response, which if successful will produce
either one or two Items, which will be of typemessage
(roleassistant) or typefunction_call. - The input audio buffer has been committed, either by the client or the
server (inserver_vadmode). The server will take the content of the
input audio buffer and add it to a new user message Item. - The client has sent a
conversation.item.createevent to add a new Item
to the Conversation.
The unique ID of the server event.
The event type, must be conversation.item.created.
Allowed values:conversation.item.created
The ID of the preceding item in the Conversation context, allows the
client to understand the order of the conversation.
The item to add to the conversation.
Show Child Parameters
RealtimeServerEventConversationItemDeleted
objectReturned when an item in the conversation is deleted by the client with a
conversation.item.delete event. This event is used to synchronize the
server’s understanding of the conversation history with the client’s view.
The unique ID of the server event.
The event type, must be conversation.item.deleted.
Allowed values:conversation.item.deleted
The ID of the item that was deleted.
RealtimeServerEventConversationItemInputAudioTranscriptionCompleted
objectThis event is the output of audio transcription for user audio written to the
user audio buffer. Transcription begins when the input audio buffer is
committed by the client or server (in server_vad mode). Transcription runs
asynchronously with Response creation, so this event may come before or after
the Response events.
Realtime API models accept audio natively, and thus input transcription is a
separate process run on a separate ASR (Automatic Speech Recognition) model,
currently always whisper-1. Thus the transcript may diverge somewhat from
the model’s interpretation, and should be treated as a rough guide.
The unique ID of the server event.
The event type, must be
conversation.item.input_audio_transcription.completed.
Allowed values:conversation.item.input_audio_transcription.completed
The ID of the user message item containing the audio.
The index of the content part containing the audio.
The transcribed text.