Ultravox Realtime LLM
The UltravoxRealtimeLLMService provides access to Ultravox's specialized audio-native models. Built for ultra-low latency, it handles both text and audio modalities natively, making it ideal for the most demanding real-time voice applications.
Installation
To use Ultravox Realtime, install the required dependencies:
pip install "piopiy-ai[ultravox]"
Prerequisites
- An Ultravox API key (Get yours here).
- Set your API key in your environment:
export ULTRAVOX_API_KEY="your_api_key_here"
Configuration
This service supports three distinct initialization modes via the params argument.
1. One-Shot Mode (OneShotInputParams)
Used for single, independent calls with custom prompts and voices.
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | Required | Your Ultravox API key. |
system_prompt | str | None | Guide the model's behavior. |
temperature | float | 0.0 | Sampling randomness (0.0 to 1.0). |
model | str | None | Model ID (e.g., "fixie-ai/ultravox"). |
voice | UUID | None | Specific voice ID to use. |
max_duration | timedelta | 1h | Maximum call duration. |
2. Agent Mode (AgentInputParams)
Used for calling pre-configured agents defined in the Ultravox console.
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | Required | Your Ultravox API key. |
agent_id | UUID | Required | The ID of your pre-defined agent. |
template_context | dict | {} | Variables for the agent template. |
3. Join Mode (JoinUrlInputParams)
Used to join an existing session via a provided URL.
Usage
One-Shot Example
import os
from piopiy.services.ultravox.llm import UltravoxRealtimeLLMService, OneShotInputParams
llm = UltravoxRealtimeLLMService(
params=OneShotInputParams(
api_key=os.getenv("ULTRAVOX_API_KEY"),
system_prompt="You are a helpful travel assistant.",
model="fixie-ai/ultravox"
)
)
Notes
- Audio-Native: Unlike standard LLMs, Ultravox is an audio-native model. While it provides text transcriptions, it understands the user's intent directly from the raw audio signal.
- Latency: This service is one of the fastest available for voice-to-voice interactions, eliminating many of the bottlenecks inherent in standard STT-LLM-TTS pipelines.
- Tool Usage: Tool calling is supported natively in One-Shot mode via the
one_shot_selected_toolsparameter.