Create a Replicate account, copy your API token, store it securely, then integrate with Bifrost for virtual keys, budgets, and cost governance. Complete setup in minutes.
Bifrost supports routing and governance for Replicate predictions alongside your other AI providers.
| Property | Details |
|---|---|
| Description | Replicate provides serverless inference for image, video, audio, and language models from the Replicate model library. |
| Provider route on Bifrost | replicate/<model> |
| Provider doc | Replicate |
| API endpoint for provider | https://api.replicate.ai/v1 |
| Supported endpoints | /v1/models, /v1/completions, /v1/chat/completions, /v1/responses, /v1/images/generations, /v1/images/edits, /v1/files, /v1/videos |
Use these Replicate links for API tokens, documentation, and the model explorer.
Before you begin, you will need:
[ QUICK START ]
Use replicate.com.
Go to replicate.com and sign up with your email or GitHub account.
Copy your default token or create a new one.
Go to replicate.com/account/api-tokens. Your default token is listed on this page.
Confirm your account if prompted.
Check your inbox for a verification link from Replicate and click it to activate your account.
Keep tokens out of source control.
Copy your token and export it as an environment variable:
export REPLICATE_API_TOKEN="r8_..."
Use the Replicate Python SDK.
Install the SDK, then run a model:
$ pip install replicate
import replicate output = replicate.run( "meta/meta-llama-3-8b-instruct", input={"prompt": "Hello from Replicate!"} ) print(output)
[ MODELS ]
| Model | API ID | Best for |
|---|---|---|
| FLUX.1 Schnell | black-forest-labs/flux-schnell | Fast text-to-image generation. |
| FLUX.1 Dev | black-forest-labs/flux-dev | Higher-quality image generation. |
| Stable Diffusion XL | stability-ai/sdxl | Classic SDXL image workflows. |
| Meta Llama 3 8B Instruct | meta/meta-llama-3-8b-instruct | Open chat on Replicate. |
| Meta Llama 3 70B Instruct | meta/meta-llama-3-70b-instruct | Larger Llama 3 chat. |
| OpenAI Whisper | openai/whisper | Speech-to-text. |
| LLaVA 13B | yorickvp/llava-13b | Vision-language Q&A. |
| Runway Gen-4 Turbo | runwayml/gen4-turbo | Video generation via Replicate. |
| DeepSeek R1 | deepseek-ai/deepseek-r1 | Reasoning model on Replicate. |
Models and availability change over time. See the Replicate model explorer for the latest list and pricing.
[ TROUBLESHOOTING ]
| Error | Likely Cause | What to Do |
|---|---|---|
401 Unauthorized | Invalid or missing API token. | Verify your API token is correct. Create a new token in account settings if needed. |
402 Payment Required | Insufficient credits on your account. | Add billing credits in your Replicate account settings. |
404 Not Found | Model version string is wrong or unavailable. | Copy the full owner/name string from replicate.com/explore. |
400 Bad Request | Invalid model version string or missing input fields. | Use the full owner/name:version string from replicate.com/explore. |
429 Rate Limited | Insufficient credits or rate limits on your account. | Add billing credits or reduce concurrent prediction volume. Use Bifrost for intelligent load distribution. |
502/503 Service Error | Temporary Replicate service unavailability. | Retry after a delay. Check Replicate status page. Configure failover with Bifrost. |
[ PRODUCTION-READY ]
Add your Replicate API token in Bifrost to track usage, enforce budgets with virtual keys, and manage providers alongside your LLM stack from one gateway.
Run the Bifrost gateway and add your Replicate API token in the Web UI.
$ npx -y @maximhq/bifrost
✓ Bifrost started ├─ HTTP server listening on http://localhost:8080 ├─ Web UI available at http://localhost:8080 └─ Configure providers and virtual keys in the dashboard
Point your OpenAI SDK at Bifrost
import os from openai import OpenAI client = OpenAI( api_key="r8_...", # Your Replicate key (via Direct Key Bypass) or "sk-bf-..." virtual key base_url="http://localhost:8080/openai" ) response = client.chat.completions.create( model="replicate/meta/meta-llama-3-8b-instruct", messages=[{"role": "user", "content": "Hello from Bifrost!"}] ) print(response.choices[0].message.content)
x-bf-vk or Authorization: Bearer sk-bf-* per the Bifrost documentation.[ WHAT'S NEXT ]
You have your API key. Add governance, guardrails, and MCP controls for production.
[ BIFROST FEATURES ]
Everything you need to run AI in production, from free open source to enterprise-grade features.
01 Governance
SAML support for SSO and Role-based access control and policy enforcement for team collaboration.
02 Adaptive Load Balancing
Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.
03 Cluster Mode
High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.
04 Alerts
Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.
05 Log Exports
Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.
06 Audit Logs
Comprehensive logging and audit trails for compliance and debugging.
07 Vault Support
Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.
08 VPC Deployment
Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.
09 Guardrails
Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.
[ SHIP RELIABLE AI ]
Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.
[ FAQ ]
Replicate is a platform that runs open-source AI models on serverless GPUs. You call models with an API token and pay per second of compute.
Yes. New accounts receive free monthly credits for testing. Add a payment method when you need more throughput for production workloads.
Replicate hosts thousands of models for image, video, audio, and language tasks, including Stable Diffusion, Llama, Flux, Whisper, and more.
Yes. Add your Replicate API token in the Bifrost Web UI to track usage, enforce budgets with virtual keys, and manage providers from one gateway.
View usage in the Replicate dashboard. For per-team or per-app tracking across providers, route requests through Bifrost.
Yes. Webhooks notify your application when long-running predictions finish, so you do not need to poll the API.