Text-to-Speech (TTS)
Overview
TTS services convert the LLM's text output back into natural-sounding audio for the caller.
Supported Providers
Cartesia
State-of-the-art low-latency voice synthesis.
from piopiy.services.cartesia.tts import CartesiaTTSService
tts = CartesiaTTSService(
api_key="YOUR_CARTESIA_KEY",
voice_id="9581E290-7F04-4B24-A619-216CC9C6C1B1"
)
ElevenLabs
High-quality, realistic voices with emotional depth.
from piopiy.services.elevenlabs.tts import ElevenLabsTTSService
tts = ElevenLabsTTSService(
api_key="YOUR_ELEVENLABS_KEY",
voice_id="pNInz6ovfV9PZpgP7Dsn"
)
Deepgram TTS (Aura)
Optimized for high-speed streaming conversations.
from piopiy.services.deepgram.tts import DeepgramTTSService
tts = DeepgramTTSService(
api_key="YOUR_DEEPGRAM_KEY",
voice_id="aura-asteria-en"
)
Performance Features
- Audio Streaming: Audio is streamed in chunks to ensure playback starts immediately.
- Voice Customization: Adjust speed, pitch, and stability depending on the provider.
- Caching: Frequently used phrases can often be cached by providers to reduce costs and latency.
Best Practices
- Sample Rate: Ensure the TTS output sample rate matches your transport requirements (24kHz is common).
- Voice Selection: Choose voices that match your agent's persona for a better user experience.