Skip to main content

OpenAI Whisper (Local)

The WhisperSTTService allows you to run OpenAI's Whisper models locally on your own hardware. It supports multiple backends, including Faster Whisper for general CPUs/GPUs and MLX Whisper specifically optimized for Apple Silicon.

Installation

Depending on your hardware, choose the appropriate installation:

For General Hardware (Faster Whisper)

pip install "piopiy-ai[whisper]"

For Apple Silicon (MLX)

pip install "piopiy-ai[mlx-whisper]"

Configuration

WhisperSTTService (General) Parameters

ParameterTypeDefaultDescription
modelstr | ModelDISTIL_MEDIUM_ENModel size (e.g., base, medium, large-v3).
devicestr"auto"Device to run on (cpu, cuda, auto).
compute_typestr"default"Precision (int8, float16, etc.).
languageLanguageENTranscription language.

WhisperSTTServiceMLX (Apple Silicon) Parameters

ParameterTypeDefaultDescription
modelstr | MLXModelTINYMLX model repository/ID.
temperaturefloat0.0Sampling temperature.

Usage

Basic Setup (Faster Whisper)

from piopiy.services.whisper.stt import WhisperSTTService, Model
from piopiy.transcriptions.language import Language

stt = WhisperSTTService(
model=Model.BASE,
device="cpu", # or "cuda" for NVIDIA GPUs
language=Language.EN
)

Apple Silicon Optimized (MLX)

from piopiy.services.whisper.stt import WhisperSTTServiceMLX, MLXModel

stt = WhisperSTTServiceMLX(
model=MLXModel.LARGE_V3_TURBO,
language=Language.EN
)

Notes

  • Initial Run: The first time you run a specific model, Piopiy will download several hundred megabytes (or gigabytes) of model weights from Hugging Face.
  • Privacy: Local Whisper processing ensures that your audio data never leaves your server.
  • HW Acceleration: For NVIDIA GPUs, ensure you have appropriate CUDA libraries installed to use device="cuda".