Large Language Models (LLM)

Overview

In Piopiy, the LLM is your agent's reasoning layer. It receives transcribed user speech (from STT), uses your instructions and context, and streams responses for TTS playback.

How Piopiy Handles LLM

Inside VoiceAgent.Action(...), Piopiy places the LLM between context aggregation and TTS:

STT provides user text.
Piopiy appends turns to session context.
LLM generates streaming response tokens.
TTS converts those tokens into spoken audio.

The LLM is session-scoped, so each live call keeps isolated conversation state.

Supported LLM Providers (SDK)

The list below is based on Piopiy SDK LLM integrations.

GPT family models for fast and high-quality conversational reasoning.

Claude models with strong reasoning and tool-use capabilities.

Gemini-based LLM integrations for multimodal and realtime use cases.

Enterprise OpenAI deployments through Azure infrastructure.

AWS-backed LLM options for production-grade cloud deployments.

Low-latency inference for realtime voice interactions.

Mistral-hosted models with strong throughput and quality.

High-speed inference optimized for low-latency serving.

Cost-effective reasoning models for general-purpose assistants.

Managed inference for open-source and custom model stacks.

Broad model catalog with flexible deployment options.

Unified routing layer across multiple LLM providers.

Search-aware conversational models for knowledge tasks.

Local/open-source LLM runtime for privacy-first deployments.

Enterprise model serving for large-scale, performance-focused workloads.

NVIDIA-hosted inference stack for accelerated model serving.

Qwen model support for multilingual and reasoning-heavy tasks.

Workflow-oriented LLM integration for specialized production pipelines.

Grok model support for realtime and conversational workloads.

Audio-native realtime LLM optimized for low-latency voice AI.

Implementation Example

import os
from piopiy.voice_agent import VoiceAgent
from piopiy.services.openai.llm import OpenAILLMService

async def on_new_session(agent_id, call_id, from_number, to_number, metadata=None):
    voice_agent = VoiceAgent(
        instructions="You are a concise voice assistant.",
        greeting="Hello! How can I help you?",
    )

    llm = OpenAILLMService(
        api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-4.1",
    )

    # initialize stt / tts as usual
    await voice_agent.Action(stt=stt, llm=llm, tts=tts)
    await voice_agent.start()

Best Practices

Keep prompts short and voice-friendly.
Prefer low-latency models for live calls.
Use tool calling for precise backend actions instead of hallucinated data.
Keep a fallback provider for reliability.

What's Next

Function Calling: Connect the LLM to real backend actions using tool schemas.
Context Management: Learn how conversation memory is maintained per session.
Telephony: Deploy LLM-driven flows to real phone sessions.

Large Language Models (LLM)

Overview

How Piopiy Handles LLM

Supported LLM Providers (SDK)

OpenAI

Anthropic

Google

Azure OpenAI

AWS Bedrock

Groq

Mistral

Cerebras

DeepSeek

Fireworks

Together AI

OpenRouter

Perplexity

Ollama

SambaNova

NVIDIA NIM

Qwen

OpenPipe

xAI Grok

Ultravox

Implementation Example

Best Practices

What's Next

Overview​

How Piopiy Handles LLM​

Supported LLM Providers (SDK)​

OpenAI

Anthropic

Google

Azure OpenAI

AWS Bedrock

Groq

Mistral

Cerebras

DeepSeek

Fireworks

Together AI

OpenRouter

Perplexity

Ollama

SambaNova

NVIDIA NIM

Qwen

OpenPipe

xAI Grok

Ultravox

Implementation Example​

Best Practices​

What's Next​

Overview

How Piopiy Handles LLM

Supported LLM Providers (SDK)

Implementation Example

Best Practices

What's Next