跳过内容

实时配置

运行配置

基础: TypedDict

用于运行实时代理会话的配置。

源代码在 src/agents/realtime/config.py
class RealtimeRunConfig(TypedDict):
    """Configuration for running a realtime agent session."""

    model_settings: NotRequired[RealtimeSessionModelSettings]
    """Settings for the realtime model session."""

    output_guardrails: NotRequired[list[OutputGuardrail[Any]]]
    """List of output guardrails to run on the agent's responses."""

    guardrails_settings: NotRequired[RealtimeGuardrailsSettings]
    """Settings for guardrail execution."""

    tracing_disabled: NotRequired[bool]
    """Whether tracing is disabled for this run."""

    async_tool_calls: NotRequired[bool]
    """Whether function tool calls should run asynchronously. Defaults to True."""

model_settings 实例属性

model_settings: NotRequired[RealtimeSessionModelSettings]

实时模型会话的设置。

output_guardrails 实例属性

output_guardrails: NotRequired[list[OutputGuardrail[Any]]]

应用于代理响应的输出 Guardrails 列表。

guardrails_settings 实例属性

guardrails_settings: NotRequired[RealtimeGuardrailsSettings]

Guardrail 执行的设置。

tracing_disabled 实例属性

tracing_disabled: NotRequired[bool]

是否禁用本次运行的 Tracing。

async_tool_calls 实例属性

async_tool_calls: NotRequired[bool]

函数工具调用是否应异步运行。默认为 True。

模型设置

基础: TypedDict

实时模型会话的模型设置。

源代码在 src/agents/realtime/config.py
class RealtimeSessionModelSettings(TypedDict):
    """Model settings for a realtime model session."""

    model_name: NotRequired[RealtimeModelName]
    """The name of the realtime model to use."""

    instructions: NotRequired[str]
    """System instructions for the model."""

    prompt: NotRequired[Prompt]
    """The prompt to use for the model."""

    modalities: NotRequired[list[Literal["text", "audio"]]]
    """The modalities the model should support."""

    output_modalities: NotRequired[list[Literal["text", "audio"]]]
    """The output modalities the model should support."""

    audio: NotRequired[RealtimeAudioConfig]
    """The audio configuration for the session."""

    voice: NotRequired[str]
    """The voice to use for audio output."""

    speed: NotRequired[float]
    """The speed of the model's responses."""

    input_audio_format: NotRequired[RealtimeAudioFormat | OpenAIRealtimeAudioFormats]
    """The format for input audio streams."""

    output_audio_format: NotRequired[RealtimeAudioFormat | OpenAIRealtimeAudioFormats]
    """The format for output audio streams."""

    input_audio_transcription: NotRequired[RealtimeInputAudioTranscriptionConfig]
    """Configuration for transcribing input audio."""

    input_audio_noise_reduction: NotRequired[RealtimeInputAudioNoiseReductionConfig | None]
    """Noise reduction configuration for input audio."""

    turn_detection: NotRequired[RealtimeTurnDetectionConfig]
    """Configuration for detecting conversation turns."""

    tool_choice: NotRequired[ToolChoice]
    """How the model should choose which tools to call."""

    tools: NotRequired[list[Tool]]
    """List of tools available to the model."""

    handoffs: NotRequired[list[Handoff]]
    """List of handoff configurations."""

    tracing: NotRequired[RealtimeModelTracingConfig | None]
    """Configuration for request tracing."""

model_name 实例属性

model_name: NotRequired[RealtimeModelName]

要使用的实时模型的名称。

instructions 实例属性

instructions: NotRequired[str]

模型的系统指令。

prompt 实例属性

prompt: NotRequired[Prompt]

模型要使用的提示语。

modalities 实例属性

modalities: NotRequired[list[Literal['text', 'audio']]]

模型应支持的模态。

output_modalities 实例属性

output_modalities: NotRequired[
    list[Literal["text", "audio"]]
]

模型应支持的输出模态。

audio 实例属性

audio: NotRequired[RealtimeAudioConfig]

会话的音频配置。

voice 实例属性

voice: NotRequired[str]

用于音频输出的声音。

speed 实例属性

speed: NotRequired[float]

模型响应的速度。

input_audio_format 实例属性

input_audio_format: NotRequired[
    RealtimeAudioFormat | RealtimeAudioFormats
]

输入音频流的格式。

output_audio_format 实例属性

output_audio_format: NotRequired[
    RealtimeAudioFormat | RealtimeAudioFormats
]

输出音频流的格式。

input_audio_transcription 实例属性

input_audio_transcription: NotRequired[
    RealtimeInputAudioTranscriptionConfig
]

输入音频转录的配置。

input_audio_noise_reduction 实例属性

input_audio_noise_reduction: NotRequired[
    RealtimeInputAudioNoiseReductionConfig | None
]

输入音频的降噪配置。

turn_detection 实例属性

turn_detection: NotRequired[RealtimeTurnDetectionConfig]

对话轮次检测的配置。

tool_choice 实例属性

tool_choice: NotRequired[ToolChoice]

模型应如何选择要调用的工具。

tools 实例属性

tools: NotRequired[list[Tool]]

模型可用的工具列表。

handoffs 实例属性

handoffs: NotRequired[list[Handoff]]

移交配置列表。

tracing 实例属性

tracing: NotRequired[RealtimeModelTracingConfig | None]

请求 Tracing 的配置。

音频配置

基础: TypedDict

实时会话中音频转录的配置。

源代码在 src/agents/realtime/config.py
class RealtimeInputAudioTranscriptionConfig(TypedDict):
    """Configuration for audio transcription in realtime sessions."""

    language: NotRequired[str]
    """The language code for transcription."""

    model: NotRequired[Literal["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"] | str]
    """The transcription model to use."""

    prompt: NotRequired[str]
    """An optional prompt to guide transcription."""

language 实例属性

language: NotRequired[str]

转录的语言代码。

model 实例属性

model: NotRequired[
    Literal[
        "gpt-4o-transcribe",
        "gpt-4o-mini-transcribe",
        "whisper-1",
    ]
    | str
]

要使用的转录模型。

prompt 实例属性

prompt: NotRequired[str]

用于指导转录的可选提示语。

基础: TypedDict

输入音频的降噪配置。

源代码在 src/agents/realtime/config.py
class RealtimeInputAudioNoiseReductionConfig(TypedDict):
    """Noise reduction configuration for input audio."""

    type: NotRequired[Literal["near_field", "far_field"]]
    """Noise reduction mode to apply to input audio."""

type 实例属性

type: NotRequired[Literal['near_field', 'far_field']]

应用于输入音频的降噪模式。

基础: TypedDict

轮次检测配置。如果需要,允许额外的供应商密钥。

源代码在 src/agents/realtime/config.py
class RealtimeTurnDetectionConfig(TypedDict):
    """Turn detection config. Allows extra vendor keys if needed."""

    type: NotRequired[Literal["semantic_vad", "server_vad"]]
    """The type of voice activity detection to use."""

    create_response: NotRequired[bool]
    """Whether to create a response when a turn is detected."""

    eagerness: NotRequired[Literal["auto", "low", "medium", "high"]]
    """How eagerly to detect turn boundaries."""

    interrupt_response: NotRequired[bool]
    """Whether to allow interrupting the assistant's response."""

    prefix_padding_ms: NotRequired[int]
    """Padding time in milliseconds before turn detection."""

    silence_duration_ms: NotRequired[int]
    """Duration of silence in milliseconds to trigger turn detection."""

    threshold: NotRequired[float]
    """The threshold for voice activity detection."""

    idle_timeout_ms: NotRequired[int]
    """Threshold for server-vad to trigger a response if the user is idle for this duration."""

type 实例属性

type: NotRequired[Literal['semantic_vad', 'server_vad']]

要使用的语音活动检测类型。

create_response 实例属性

create_response: NotRequired[bool]

检测到轮次时是否创建响应。

eagerness 实例属性

eagerness: NotRequired[
    Literal["auto", "low", "medium", "high"]
]

检测轮次边界的积极程度。

interrupt_response 实例属性

interrupt_response: NotRequired[bool]

是否允许中断助手的响应。

prefix_padding_ms 实例属性

prefix_padding_ms: NotRequired[int]

轮次检测前的时间填充(毫秒)。

silence_duration_ms 实例属性

silence_duration_ms: NotRequired[int]

触发轮次检测的静默持续时间(毫秒)。

threshold 实例属性

threshold: NotRequired[float]

语音活动检测的阈值。

idle_timeout_ms 实例属性

idle_timeout_ms: NotRequired[int]

如果用户在此持续时间内处于空闲状态,则 server-vad 触发响应的阈值。

Guardrails 设置

基础: TypedDict

实时会话中输出 Guardrails 的设置。

源代码在 src/agents/realtime/config.py
class RealtimeGuardrailsSettings(TypedDict):
    """Settings for output guardrails in realtime sessions."""

    debounce_text_length: NotRequired[int]
    """
    The minimum number of characters to accumulate before running guardrails on transcript
    deltas. Defaults to 100. Guardrails run every time the accumulated text reaches
    1x, 2x, 3x, etc. times this threshold.
    """

debounce_text_length 实例属性

debounce_text_length: NotRequired[int]

在 Guardrails 上运行转录 delta 之前要累积的最小字符数。默认为 100。Guardrails 每次累积文本达到此阈值的 1 倍、2 倍、3 倍等倍数时运行。

模型配置

基础: TypedDict

连接到实时模型的选项。

源代码在 src/agents/realtime/model.py
class RealtimeModelConfig(TypedDict):
    """Options for connecting to a realtime model."""

    api_key: NotRequired[str | Callable[[], MaybeAwaitable[str]]]
    """The API key (or function that returns a key) to use when connecting. If unset, the model will
    try to use a sane default. For example, the OpenAI Realtime model will try to use the
    `OPENAI_API_KEY`  environment variable.
    """

    url: NotRequired[str]
    """The URL to use when connecting. If unset, the model will use a sane default. For example,
    the OpenAI Realtime model will use the default OpenAI WebSocket URL.
    """

    headers: NotRequired[dict[str, str]]
    """The headers to use when connecting. If unset, the model will use a sane default.
    Note that, when you set this, authorization header won't be set under the hood.
    e.g., {"api-key": "your api key here"} for Azure OpenAI Realtime WebSocket connections.
    """

    initial_model_settings: NotRequired[RealtimeSessionModelSettings]
    """The initial model settings to use when connecting."""

    playback_tracker: NotRequired[RealtimePlaybackTracker]
    """The playback tracker to use when tracking audio playback progress. If not set, the model will
    use a default implementation that assumes audio is played immediately, at realtime speed.

    A playback tracker is useful for interruptions. The model generates audio much faster than
    realtime playback speed. So if there's an interruption, its useful for the model to know how
    much of the audio has been played by the user. In low-latency scenarios, it's fine to assume
    that audio is played back immediately at realtime speed. But in scenarios like phone calls or
    other remote interactions, you can set a playback tracker that lets the model know when audio
    is played to the user.
    """

    call_id: NotRequired[str]
    """Attach to an existing realtime call instead of creating a new session.

    When provided, the transport connects using the `call_id` query string parameter rather than a
    model name. This is used for SIP-originated calls that are accepted via the Realtime Calls API.
    """

api_key 实例属性

api_key: NotRequired[
    str | Callable[[], MaybeAwaitable[str]]
]

连接时要使用的 API 密钥(或返回密钥的函数)。如果未设置,模型将尝试使用合理的默认值。例如,OpenAI 实时模型将尝试使用 OPENAI_API_KEY 环境变量。

url 实例属性

url: NotRequired[str]

连接时要使用的 URL。如果未设置,模型将使用合理的默认值。例如,OpenAI 实时模型将使用默认的 OpenAI WebSocket URL。

headers 实例属性

headers: NotRequired[dict[str, str]]

连接时要使用的标头。如果未设置,模型将使用合理的默认值。请注意,当您设置此项时,授权标头不会在后台设置。例如,对于 Azure OpenAI 实时 WebSocket 连接,使用 {"api-key": "您的 api 密钥"}。

initial_model_settings 实例属性

initial_model_settings: NotRequired[
    RealtimeSessionModelSettings
]

连接时要使用的初始模型设置。

playback_tracker 实例属性

playback_tracker: NotRequired[RealtimePlaybackTracker]

用于跟踪音频播放进度的播放跟踪器。如果未设置,模型将使用默认实现,该实现假定音频立即以实时速度播放。

播放跟踪器对于中断很有用。模型比实时播放速度快得多地生成音频。因此,如果发生中断,模型知道用户播放了多少音频是有用的。在低延迟场景中,假设音频立即以实时速度播放是可以的。但在电话或其他远程交互等场景中,您可以设置一个播放跟踪器,让模型知道音频何时播放给用户。

call_id 实例属性

call_id: NotRequired[str]

附加到现有的实时通话,而不是创建新的会话。

提供后,传输将使用 call_id 查询字符串参数而不是模型名称进行连接。这用于通过实时通话 API 接受的 SIP 发起的呼叫。

Tracing 配置

基础: TypedDict

实时模型会话中 Tracing 的配置。

源代码在 src/agents/realtime/config.py
class RealtimeModelTracingConfig(TypedDict):
    """Configuration for tracing in realtime model sessions."""

    workflow_name: NotRequired[str]
    """The workflow name to use for tracing."""

    group_id: NotRequired[str]
    """A group identifier to use for tracing, to link multiple traces together."""

    metadata: NotRequired[dict[str, Any]]
    """Additional metadata to include with the trace."""

workflow_name 实例属性

workflow_name: NotRequired[str]

用于 Tracing 的工作流名称。

group_id 实例属性

group_id: NotRequired[str]

用于 Tracing 的组标识符,以将多个 Traces 链接在一起。

metadata 实例属性

metadata: NotRequired[dict[str, Any]]

包含在 Traces 中的其他元数据。

用户输入类型

可以为字符串或结构化消息的用户输入。

基础: TypedDict

来自用户的文本输入。

源代码在 src/agents/realtime/config.py
class RealtimeUserInputText(TypedDict):
    """A text input from the user."""

    type: Literal["input_text"]
    """The type identifier for text input."""

    text: str
    """The text content from the user."""

type 实例属性

type: Literal['input_text']

文本输入的类型标识符。

text 实例属性

text: str

来自用户的文本内容。

基础: TypedDict

来自用户的消息输入。

源代码在 src/agents/realtime/config.py
class RealtimeUserInputMessage(TypedDict):
    """A message input from the user."""

    type: Literal["message"]
    """The type identifier for message inputs."""

    role: Literal["user"]
    """The role identifier for user messages."""

    content: list[RealtimeUserInputText | RealtimeUserInputImage]
    """List of content items (text and image) in the message."""

type 实例属性

type: Literal['message']

消息输入的类型标识符。

role 实例属性

role: Literal['user']

用户消息的角色标识符。

content 实例属性

content: list[
    RealtimeUserInputText | RealtimeUserInputImage
]

消息中的内容项列表(文本和图像)。

客户端消息

基础: TypedDict

要发送给模型的原始消息。

源代码在 src/agents/realtime/config.py
class RealtimeClientMessage(TypedDict):
    """A raw message to be sent to the model."""

    type: str  # explicitly required
    """The type of the message."""

    other_data: NotRequired[dict[str, Any]]
    """Merged into the message body."""

type 实例属性

type: str

消息的类型。

other_data 实例属性

other_data: NotRequired[dict[str, Any]]

合并到消息正文中。

类型别名

实时模型的名称。

实时音频流的音频格式。