Speechmatics
The SpeechmaticsSTTService provides enterprise-grade, real-time speech-to-text using Speechmatics' Autonomous Speech Recognition (ASR). It features high accuracy, low latency, and advanced speaker diarization (who spoke when).
Installation
To use Speechmatics, install the required dependencies:
pip install "piopiy-ai[speechmatics]"
Prerequisites
- A Speechmatics account and API key (Get yours here).
- Set your API key in your environment:
export SPEECHMATICS_API_KEY="your_api_key_here"
Configuration
SpeechmaticsSTTService Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | Speechmatics API key (defaults to env var). |
base_url | str | None | WebSocket URL for Speechmatics ASR. |
sample_rate | int | None | Audio sample rate in Hz. |
params | InputParams | InputParams() | Advanced recognition and diarization settings. |
InputParams
| Parameter | Type | Default | Description |
|---|---|---|---|
language | Language | EN | Language code for transcription. |
operating_point | str | "enhanced" | Accuracy vs latency tradeoff (standard, enhanced). |
enable_diarization | bool | False | Enable speaker identification. |
turn_detection_mode | str | "fixed" | Endpoint handling (fixed, adaptive, smart_turn). |
max_delay | float | None | Maximum delay for transcription results. |
Usage
Basic Setup
import os
from piopiy.services.speechmatics.stt import SpeechmaticsSTTService
from piopiy.transcriptions.language import Language
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
operating_point="enhanced"
)
)
With Smart Turn Detection
from piopiy.services.speechmatics.stt import SpeechmaticsSTTService, TurnDetectionMode
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
turn_detection_mode=TurnDetectionMode.SMART_TURN
)
)
Notes
- Diarization: When
enable_diarizationis true, transcripts will include speaker labels (e.g.,S1: Hello). - Endpointing:
SMART_TURNuses advanced ML models to detect when a user has finished speaking, providing a more natural interaction flow.