Ultravox Realtime LLM

The UltravoxRealtimeLLMService provides access to Ultravox's specialized audio-native models. Built for ultra-low latency, it handles both text and audio modalities natively, making it ideal for the most demanding real-time voice applications.

Installation

To use Ultravox Realtime, install the required dependencies:

pip install "piopiy-ai[ultravox]"

Prerequisites

An Ultravox API key (Get yours here).

Set your API key in your environment:

export ULTRAVOX_API_KEY="your_api_key_here"

Configuration

This service supports three distinct initialization modes via the params argument.

1. One-Shot Mode (`OneShotInputParams`)

Used for single, independent calls with custom prompts and voices.

Parameter	Type	Default	Description
`api_key`	`str`	Required	Your Ultravox API key.
`system_prompt`	`str`	`None`	Guide the model's behavior.
`temperature`	`float`	`0.0`	Sampling randomness (0.0 to 1.0).
`model`	`str`	`None`	Model ID (e.g., `"fixie-ai/ultravox"`).
`voice`	`UUID`	`None`	Specific voice ID to use.
`max_duration`	`timedelta`	`1h`	Maximum call duration.

2. Agent Mode (`AgentInputParams`)

Used for calling pre-configured agents defined in the Ultravox console.

Parameter	Type	Default	Description
`api_key`	`str`	Required	Your Ultravox API key.
`agent_id`	`UUID`	Required	The ID of your pre-defined agent.
`template_context`	`dict`	`{}`	Variables for the agent template.

3. Join Mode (`JoinUrlInputParams`)

Used to join an existing session via a provided URL.

Usage

One-Shot Example

import os
from piopiy.services.ultravox.llm import UltravoxRealtimeLLMService, OneShotInputParams

llm = UltravoxRealtimeLLMService(
    params=OneShotInputParams(
        api_key=os.getenv("ULTRAVOX_API_KEY"),
        system_prompt="You are a helpful travel assistant.",
        model="fixie-ai/ultravox"
    )
)

Notes

Audio-Native: Unlike standard LLMs, Ultravox is an audio-native model. While it provides text transcriptions, it understands the user's intent directly from the raw audio signal.
Latency: This service is one of the fastest available for voice-to-voice interactions, eliminating many of the bottlenecks inherent in standard STT-LLM-TTS pipelines.
Tool Usage: Tool calling is supported natively in One-Shot mode via the one_shot_selected_tools parameter.

Installation​

Prerequisites​

Configuration​

1. One-Shot Mode (OneShotInputParams)​

2. Agent Mode (AgentInputParams)​

3. Join Mode (JoinUrlInputParams)​

Usage​

One-Shot Example​

Notes​