Whisper (Local)
The WhisperSTTService allows you to run OpenAI's Whisper models locally on your own hardware. It supports multiple backends, including Faster Whisper for general CPUs/GPUs and MLX Whisper specifically optimized for Apple Silicon.
Installation
Depending on your hardware, choose the appropriate installation:
For General Hardware (Faster Whisper)
pip install "piopiy-ai[whisper]"
For Apple Silicon (MLX)
pip install "piopiy-ai[mlx-whisper]"
Configuration
WhisperSTTService (General) Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | Model | DISTIL_MEDIUM_EN | Model size (e.g., base, medium, large-v3). |
device | str | "auto" | Device to run on (cpu, cuda, auto). |
compute_type | str | "default" | Precision (int8, float16, etc.). |
language | Language | EN | Transcription language. |
WhisperSTTServiceMLX (Apple Silicon) Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | MLXModel | TINY | MLX model repository/ID. |
temperature | float | 0.0 | Sampling temperature. |
Usage
Basic Setup (Faster Whisper)
from piopiy.services.whisper.stt import WhisperSTTService, Model
from piopiy.transcriptions.language import Language
stt = WhisperSTTService(
model=Model.BASE,
device="cpu", # or "cuda" for NVIDIA GPUs
language=Language.EN
)
Apple Silicon Optimized (MLX)
from piopiy.services.whisper.stt import WhisperSTTServiceMLX, MLXModel
stt = WhisperSTTServiceMLX(
model=MLXModel.LARGE_V3_TURBO,
language=Language.EN
)
Notes
- Initial Run: The first time you run a specific model, Piopiy will download several hundred megabytes (or gigabytes) of model weights from Hugging Face.
- Privacy: Local Whisper processing ensures that your audio data never leaves your server.
- HW Acceleration: For NVIDIA GPUs, ensure you have appropriate CUDA libraries installed to use
device="cuda".