Server API Introduction

Overview

The Piopiy Server SDK is a Python framework for building high-performance, real-time voice applications. It manages the complex orchestration required for voice AI, including streaming audio, handling interruptions (barge-in), and coordinating between different AI services.

Core Concepts

Piopiy is built on a few key architectural principles that ensure low latency and a natural conversational experience.

1. Asynchronous Execution

The SDK is built entirely on Python's asyncio. This allows the agent to handle multiple streams (audio in, audio out, transcription, LLM tokens) concurrently without blocking.

2. Pipeline Architecture

Your agent's brain is configured as a pipeline. Data flows through a series of "processors":

Transport: Manages raw audio input/output (Telephony, WebRTC).
STT: Transcribes speech to text.
Aggregators: Buffer transcripts or LLM tokens for logical processing.
LLM: Reasons about the conversation and generates responses.
TTS: Synthesizes response text back into audio.

3. State & Context

The VoiceAgent maintains a managed LLMContext, which automatically tracks conversation history, system instructions, and tool schemas.

Example Usage

Here is a look at how a basic pipeline is structured:

from piopiy.voice_agent import VoiceAgent
from piopiy.services.deepgram.stt import DeepgramSTTService
from piopiy.services.openai.llm import OpenAILLMService
from piopiy.services.cartesia.tts import CartesiaTTSService

async def create_session(call_id=None, **kwargs):
    voice_agent = VoiceAgent(instructions="You are a helpful assistant.")

    # Initialize Services
    stt = DeepgramSTTService(api_key="...", model="nova-2")
    llm = OpenAILLMService(api_key="...", model="gpt-4o")
    tts = CartesiaTTSService(api_key="...")

    # Build and Start Pipeline
    await voice_agent.Action(stt=stt, llm=llm, tts=tts)
    await voice_agent.start()

Next Steps

Speech-to-Text (STT): Transcribing voices in real-time.
Large Language Models (LLM): Conversational reasoning.
Text-to-Speech (TTS): Creating expressive voice responses.

Overview​

Core Concepts​

1. Asynchronous Execution​

2. Pipeline Architecture​

3. State & Context​

Example Usage​

Next Steps​