Skip to main content

Cartesia Cartesia

The CartesiaTTSService provides ultra-low latency, high-fidelity text-to-speech synthesis using Cartesia's Sonic models. It's the recommended TTS provider for high-performance, deeply conversational AI agents.

Installation

To use Cartesia, install the required dependencies:

pip install "piopiy-ai[cartesia]"

Prerequisites

  • A Cartesia account and API key (Get yours here).
  • Set your API key in your environment:
    export CARTESIA_API_KEY="your_api_key_here"

Configuration

CartesiaTTSService Parameters

ParameterTypeDefaultDescription
api_keystrRequiredYour Cartesia API key.
voice_idstrRequiredThe ID of the voice to use.
modelstr"sonic-3"The TTS model (e.g., sonic-3, sonic-multilingual).
containerstr"raw"Audio container format.
encodingstr"pcm_s16le"Audio encoding format.
sample_rateintNoneAudio sample rate (defaults to transport rate).
paramsInputParamsInputParams()Advanced voice settings.

GenerationConfig (within params)

For Sonic-3 models, use generation_config for natural guidance:

ParameterTypeRangeDescription
volumefloat0.5 - 2.0Volume multiplier.
speedfloat0.6 - 1.5Numeric speed multiplier.
emotionstr-Emotion string (e.g., neutral, angry, excited).

Usage

Basic Setup

import os
from piopiy.services.cartesia.tts import CartesiaTTSService

tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-34653914c677", # Helpfull AI
)

With Expressive Controls (Sonic-3)

from piopiy.services.cartesia.tts import CartesiaTTSService, GenerationConfig

config = GenerationConfig(
speed=1.1,
emotion="excited",
volume=1.0
)

tts = CartesiaTTSService(
api_key=os.getenv("CARTESIA_API_KEY"),
voice_id="79a125e8-cd45-4c13-8a67-34653914c677",
params=CartesiaTTSService.InputParams(
generation_config=config
)
)

Notes

  • Language Support: Sonic-3 supports multiple languages; ensure you set language in InputParams for non-English use cases.
  • SSML Alternatives: Piopiy provides helper methods like SPELL(), PAUSE_TAG(), and EMOTION_TAG() to simplify text transformations.