Skip to main content

Speech-to-Text (STT)

Overview

STT services transcribe the live audio stream into text. Piopiy supports multiple high-performance STT providers out of the box.

Supported Providers

Deepgram

The recommended choice for production due to its low latency and high accuracy.

from piopiy.services.deepgram.stt import DeepgramSTTService

stt = DeepgramSTTService(
api_key="YOUR_DEEPGRAM_KEY",
model="nova-2",
language="en-US"
)

Google Speech-to-Text

Reliable and supports a vast array of languages.

from piopiy.services.google.stt import GoogleSTTService

stt = GoogleSTTService(
credentials_path="path/to/creds.json",
language_code="en-US"
)

Advanced Features

  • Punctuation & Diarization: Most providers support automatic punctuation.
  • Language Detection: Automatically detect the caller's language.
  • Interim Results: Piopiy uses interim results to start LLM processing even before the user finishes speaking, reducing perceived latency.

Best Practices

  • Model Selection: Use "latest" or "distilled" models (like nova-2) for the best balance of speed and accuracy.
  • Audio Sample Rate: Match your STT sample rate to your transport (usually 16kHz for telephony).