Product

AI Inference API

Build AI features with a unified API for reliable chat completions and inference workflows. One endpoint, multiple models, full observability.

View Docs Request API Access

request.sh

curl https://api.chatinfer.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain AI inference in one sentence."}
    ],
    "temperature": 0.7
  }'

Why unified

One API for every model

Stop managing separate SDKs, authentication, and rate limits for each LLM provider. ChatInfer gives you a single endpoint that routes to the best model for your use case.

Without ChatInfer

Multiple SDKs and authentication methods
Inconsistent request formats across providers
No centralized usage or cost tracking
Manual fallback handling on errors

With ChatInfer

One API key, one authentication method
Unified request format for all models
Centralized dashboard for usage and cost
Automatic retries and fallback routing

Capabilities

API capabilities

Everything you need to build reliable AI-powered features.

Chat completion API

A single, unified API for sending chat completion requests to multiple LLM providers. No need to manage separate SDKs or authentication.

Unified request format

Use the same request format regardless of the underlying model. Switch between providers without changing your integration code.

Usage monitoring

Track request volume, latency, token usage, and cost from a single dashboard. Understand how your AI features are performing.

Error handling

Built-in error handling with clear error codes, automatic retries, and fallback routing to keep your application reliable.

Model routing

Route requests to the best model for each use case. Configure routing rules based on cost, latency, or model capability.

Early access API keys

Early access users receive API keys with generous rate limits for testing and development before general availability.

Example

Request and response

Send a chat completion request and get a structured response with token usage.

Request

POST /v1/chat/completions

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is AI inference?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 150
}

Response

200 OK

{
  "id": "chatcmpl-abc123",
  "model": "gpt-4o",
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "AI inference is the process of a trained machine learning model making predictions or generating outputs based on new input data."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 24,
    "total_tokens": 52
  }
}

Workflow

How it works

Get started with the ChatInfer Inference API in four steps.

Inference workflow

Authenticate

Include your API key in the Authorization header.

Send request

Send a chat completion request with model, messages, and parameters.

Receive response

Get back the model's response with token usage metadata.

Monitor usage

Track requests, latency, and cost from the dashboard.

Reliability

Built for production

Your AI features need to be reliable. ChatInfer handles the infrastructure so you can focus on building.

Automatic retries

Transient failures are retried automatically with exponential backoff.

Fallback routing

If a model is unavailable, requests are routed to a fallback model.

Rate limiting

Clear rate limits with informative headers and graceful degradation.

Observability

Monitor request status, latency, and error rates from the dashboard.

Start building with the Inference API

Get your API key and start shipping AI features today.

Request API Access View Documentation