Speech-to-Text (STT)

Overview

STT services transcribe the live audio stream into text. Piopiy supports multiple high-performance STT providers out of the box.

Supported Providers

Deepgram

The recommended choice for production due to its low latency and high accuracy.

from piopiy.services.deepgram.stt import DeepgramSTTService

stt = DeepgramSTTService(
    api_key="YOUR_DEEPGRAM_KEY",
    model="nova-2",
    language="en-US"
)

Google Speech-to-Text

Reliable and supports a vast array of languages.

from piopiy.services.google.stt import GoogleSTTService

stt = GoogleSTTService(
    credentials_path="path/to/creds.json",
    language_code="en-US"
)

Advanced Features

Punctuation & Diarization: Most providers support automatic punctuation.
Language Detection: Automatically detect the caller's language.
Interim Results: Piopiy uses interim results to start LLM processing even before the user finishes speaking, reducing perceived latency.

Best Practices

Model Selection: Use "latest" or "distilled" models (like nova-2) for the best balance of speed and accuracy.
Audio Sample Rate: Match your STT sample rate to your transport (usually 16kHz for telephony).

Overview​

Supported Providers​

Deepgram​

Google Speech-to-Text​

Advanced Features​