Text-to-Speech (TTS)

Overview

TTS services convert the LLM's text output back into natural-sounding audio for the caller.

Supported Providers

Cartesia

State-of-the-art low-latency voice synthesis.

from piopiy.services.cartesia.tts import CartesiaTTSService

tts = CartesiaTTSService(
    api_key="YOUR_CARTESIA_KEY",
    voice_id="9581E290-7F04-4B24-A619-216CC9C6C1B1"
)

ElevenLabs

High-quality, realistic voices with emotional depth.

from piopiy.services.elevenlabs.tts import ElevenLabsTTSService

tts = ElevenLabsTTSService(
    api_key="YOUR_ELEVENLABS_KEY",
    voice_id="pNInz6ovfV9PZpgP7Dsn"
)

Deepgram TTS (Aura)

Optimized for high-speed streaming conversations.

from piopiy.services.deepgram.tts import DeepgramTTSService

tts = DeepgramTTSService(
    api_key="YOUR_DEEPGRAM_KEY",
    voice_id="aura-asteria-en"
)

Performance Features

Audio Streaming: Audio is streamed in chunks to ensure playback starts immediately.
Voice Customization: Adjust speed, pitch, and stability depending on the provider.
Caching: Frequently used phrases can often be cached by providers to reduce costs and latency.

Best Practices

Sample Rate: Ensure the TTS output sample rate matches your transport requirements (24kHz is common).
Voice Selection: Choose voices that match your agent's persona for a better user experience.

Overview​

Supported Providers​

Cartesia​

ElevenLabs​

Deepgram TTS (Aura)​

Performance Features​

Best Practices​