Speech-to-Text (STT)
Overview
STT services transcribe the live audio stream into text. Piopiy supports multiple high-performance STT providers out of the box.
Supported Providers
Deepgram
The recommended choice for production due to its low latency and high accuracy.
from piopiy.services.deepgram.stt import DeepgramSTTService
stt = DeepgramSTTService(
api_key="YOUR_DEEPGRAM_KEY",
model="nova-2",
language="en-US"
)
Google Speech-to-Text
Reliable and supports a vast array of languages.
from piopiy.services.google.stt import GoogleSTTService
stt = GoogleSTTService(
credentials_path="path/to/creds.json",
language_code="en-US"
)
Advanced Features
- Punctuation & Diarization: Most providers support automatic punctuation.
- Language Detection: Automatically detect the caller's language.
- Interim Results: Piopiy uses interim results to start LLM processing even before the user finishes speaking, reducing perceived latency.
Best Practices
- Model Selection: Use "latest" or "distilled" models (like
nova-2) for the best balance of speed and accuracy. - Audio Sample Rate: Match your STT sample rate to your transport (usually 16kHz for telephony).