Cerebras
Overview
The CerebrasLLMService integrates Cerebras's ultra-fast inference API, powered by their Wafer-Scale Engine (WSE). It is designed for applications requiring the absolute lowest latency for open-source models.
Installation
pip install piopiy-ai
Prerequisites
- A Cerebras API key (Get yours here).
- Set your API key in your environment:
export CEREBRAS_API_KEY="your_api_key_here"
Configuration
CerebrasLLMService Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | Required | Your Cerebras API key. |
model | str | "llama3.1-8b" | Model identifier. |
base_url | str | "https://api.cerebras.ai/v1" | API endpoint. |
Usage
Basic Setup
import os
from piopiy.services.cerebras.llm import CerebrasLLMService
llm = CerebrasLLMService(
api_key=os.getenv("CEREBRAS_API_KEY"),
model="llama3.1-70b"
)
Notes
- Extreme Speed: Cerebras provides some of the highest token-per-second rates in the industry, making it exceptional for conversational AI.
- Support: Currently supports Llama 3.1 and future open-source models as they are added to the Cerebras platform.