Product
AI Inference API
Build AI features with a unified API for reliable chat completions and inference workflows. One endpoint, multiple models, full observability.
curl https://api.chatinfer.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain AI inference in one sentence."}
],
"temperature": 0.7
}'Why unified
One API for every model
Stop managing separate SDKs, authentication, and rate limits for each LLM provider. ChatInfer gives you a single endpoint that routes to the best model for your use case.
Without ChatInfer
- Multiple SDKs and authentication methods
- Inconsistent request formats across providers
- No centralized usage or cost tracking
- Manual fallback handling on errors
With ChatInfer
- One API key, one authentication method
- Unified request format for all models
- Centralized dashboard for usage and cost
- Automatic retries and fallback routing
Capabilities
API capabilities
Everything you need to build reliable AI-powered features.
Chat completion API
A single, unified API for sending chat completion requests to multiple LLM providers. No need to manage separate SDKs or authentication.
Unified request format
Use the same request format regardless of the underlying model. Switch between providers without changing your integration code.
Usage monitoring
Track request volume, latency, token usage, and cost from a single dashboard. Understand how your AI features are performing.
Error handling
Built-in error handling with clear error codes, automatic retries, and fallback routing to keep your application reliable.
Model routing
Route requests to the best model for each use case. Configure routing rules based on cost, latency, or model capability.
Early access API keys
Early access users receive API keys with generous rate limits for testing and development before general availability.
Example
Request and response
Send a chat completion request and get a structured response with token usage.
Request
{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is AI inference?"
}
],
"temperature": 0.7,
"max_tokens": 150
}Response
{
"id": "chatcmpl-abc123",
"model": "gpt-4o",
"choices": [
{
"message": {
"role": "assistant",
"content": "AI inference is the process of a trained machine learning model making predictions or generating outputs based on new input data."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 24,
"total_tokens": 52
}
}Workflow
How it works
Get started with the ChatInfer Inference API in four steps.
Inference workflow
Authenticate
Include your API key in the Authorization header.
Send request
Send a chat completion request with model, messages, and parameters.
Receive response
Get back the model's response with token usage metadata.
Monitor usage
Track requests, latency, and cost from the dashboard.
Reliability
Built for production
Your AI features need to be reliable. ChatInfer handles the infrastructure so you can focus on building.
Automatic retries
Transient failures are retried automatically with exponential backoff.
Fallback routing
If a model is unavailable, requests are routed to a fallback model.
Rate limiting
Clear rate limits with informative headers and graceful degradation.
Observability
Monitor request status, latency, and error rates from the dashboard.
Start building with the Inference API
Get your API key and start shipping AI features today.