Skip to main content

Cerebras

Overview

The CerebrasLLMService integrates Cerebras's ultra-fast inference API, powered by their Wafer-Scale Engine (WSE). It is designed for applications requiring the absolute lowest latency for open-source models.

Installation

pip install piopiy-ai

Prerequisites

  • A Cerebras API key (Get yours here).
  • Set your API key in your environment:
    export CEREBRAS_API_KEY="your_api_key_here"

Configuration

CerebrasLLMService Parameters

ParameterTypeDefaultDescription
api_keystrRequiredYour Cerebras API key.
modelstr"llama3.1-8b"Model identifier.
base_urlstr"https://api.cerebras.ai/v1"API endpoint.

Usage

Basic Setup

import os
from piopiy.services.cerebras.llm import CerebrasLLMService

llm = CerebrasLLMService(
api_key=os.getenv("CEREBRAS_API_KEY"),
model="llama3.1-70b"
)

Notes

  • Extreme Speed: Cerebras provides some of the highest token-per-second rates in the industry, making it exceptional for conversational AI.
  • Support: Currently supports Llama 3.1 and future open-source models as they are added to the Cerebras platform.