Try Bifrost Enterprise free for 14 days.

PERFORMANCE FEATURES ENTERPRISE PRICING DOCS BLOG

How to Get a Replicate API Key

Create a Replicate account, copy your API token, store it securely, then integrate with Bifrost for virtual keys, budgets, and cost governance. Complete setup in minutes.

API tokensServerless GPUsOpen-source modelsWebhooksBifrost gateway

Replicate provider summary

Bifrost supports routing and governance for Replicate predictions alongside your other AI providers.

Property	Details
Description	Replicate provides serverless inference for image, video, audio, and language models from the Replicate model library.
Provider route on Bifrost	replicate/<model>
Provider doc	Replicate
API endpoint for provider	https://api.replicate.ai/v1
Supported endpoints	/v1/models, /v1/completions, /v1/chat/completions, /v1/responses, /v1/images/generations, /v1/images/edits, /v1/files, /v1/videos

Official Replicate Resources

Use these Replicate links for API tokens, documentation, and the model explorer.

Prerequisites

Before you begin, you will need:

Replicate accountEmail addressBrowser access

Free credits: Replicate includes free monthly credits for new accounts. Add billing when you need higher throughput for production workloads.

[ QUICK START ]

How Do You Get a Replicate API Key in 5 Steps?

Create or sign in to a Replicate account

Use replicate.com.

Go to replicate.com and sign up with your email or GitHub account.

Open API tokens

Copy your default token or create a new one.

Go to replicate.com/account/api-tokens. Your default token is listed on this page.

Verify your email

Confirm your account if prompted.

Check your inbox for a verification link from Replicate and click it to activate your account.

Store your API token securely

Keep tokens out of source control.

Copy your token and export it as an environment variable:

Terminal (macOS/Linux)

export REPLICATE_API_TOKEN="r8_..."

Treat tokens like passwords: Never expose API tokens in client-side code or commit them to version control.

Make your first prediction

Use the Replicate Python SDK.

Install the SDK, then run a model:

Terminal

$ pip install replicate

example.py

import replicate

output = replicate.run(
    "meta/meta-llama-3-8b-instruct",
    input={"prompt": "Hello from Replicate!"}
)

print(output)

[ MODELS ]

Popular Replicate Models

Model	API ID	Best for
FLUX.1 Schnell	black-forest-labs/flux-schnell	Fast text-to-image generation.
FLUX.1 Dev	black-forest-labs/flux-dev	Higher-quality image generation.
Stable Diffusion XL	stability-ai/sdxl	Classic SDXL image workflows.
Meta Llama 3 8B Instruct	meta/meta-llama-3-8b-instruct	Open chat on Replicate.
Meta Llama 3 70B Instruct	meta/meta-llama-3-70b-instruct	Larger Llama 3 chat.
OpenAI Whisper	openai/whisper	Speech-to-text.
LLaVA 13B	yorickvp/llava-13b	Vision-language Q&A.
Runway Gen-4 Turbo	runwayml/gen4-turbo	Video generation via Replicate.
DeepSeek R1	deepseek-ai/deepseek-r1	Reasoning model on Replicate.

Models and availability change over time. See the Replicate model explorer for the latest list and pricing.

[ TROUBLESHOOTING ]

Troubleshooting Common Replicate API Errors

Error	Likely Cause	What to Do
`401 Unauthorized`	Invalid or missing API token.	Verify your API token is correct. Create a new token in account settings if needed.
`402 Payment Required`	Insufficient credits on your account.	Add billing credits in your Replicate account settings.
`404 Not Found`	Model version string is wrong or unavailable.	Copy the full owner/name string from replicate.com/explore.
`400 Bad Request`	Invalid model version string or missing input fields.	Use the full owner/name:version string from replicate.com/explore.
`429 Rate Limited`	Insufficient credits or rate limits on your account.	Add billing credits or reduce concurrent prediction volume. Use Bifrost for intelligent load distribution.
`502/503 Service Error`	Temporary Replicate service unavailability.	Retry after a delay. Check Replicate status page. Configure failover with Bifrost.

[ PRODUCTION-READY ]

Use Your Replicate Token with Bifrost

Add your Replicate API token in Bifrost to track usage, enforce budgets with virtual keys, and manage providers alongside your LLM stack from one gateway.

Step 1: Start Bifrost and register Replicate

Run the Bifrost gateway and add your Replicate API token in the Web UI.

Terminal

$ npx -y @maximhq/bifrost

OUTPUT

✓ Bifrost started
├─ HTTP server listening on http://localhost:8080
├─ Web UI available at   http://localhost:8080
└─ Configure providers and virtual keys in the dashboard

→

Add the Replicate provider in the Web UI. For details, read supported providers on Bifrost.

Step 2: Point your OpenAI SDK at Bifrost

Point your OpenAI SDK at Bifrost

example.py

import os
from openai import OpenAI

client = OpenAI(
    api_key="r8_...",  # Your Replicate key (via Direct Key Bypass) or "sk-bf-..." virtual key
    base_url="http://localhost:8080/openai"
)

response = client.chat.completions.create(
    model="replicate/meta/meta-llama-3-8b-instruct",
    messages=[{"role": "user", "content": "Hello from Bifrost!"}]
)

print(response.choices[0].message.content)

→

Virtual keys can be sent as x-bf-vk or Authorization: Bearer sk-bf-* per the Bifrost documentation.

[ WHAT'S NEXT ]

Explore Bifrost Resources

You have your API key. Add governance, guardrails, and MCP controls for production.

Access Control

Governance

Virtual keys, budgets, rate limits, routing, and enterprise RBAC with SSO.

Security

Guardrails

PII detection, content moderation, prompt injection defense, and compliance.

MCP

MCP Gateway

High-performance tool execution for AI agents with approvals and audit trails.

View all resources

Ready to Route Replicate Through Bifrost?

Bifrost is open source and production-ready. Get started in minutes with cost tracking, virtual keys, and failover built in.

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os

2from anthropic import Anthropic

4anthropic = Anthropic(

5 api_key=os.environ.get("ANTHROPIC_API_KEY"),

6 base_url="https://<bifrost_url>/anthropic",

9message = anthropic.messages.create(

10 model="claude-3-5-sonnet-20241022",

11 max_tokens=1024,

12 messages=[

13 {"role": "user", "content": "Hello, Claude"}

14 ]

15)

Drop in once, run everywhere.

[ FAQ ]

Frequently Asked Questions

Replicate is a platform that runs open-source AI models on serverless GPUs. You call models with an API token and pay per second of compute.

Yes. New accounts receive free monthly credits for testing. Add a payment method when you need more throughput for production workloads.

Replicate hosts thousands of models for image, video, audio, and language tasks, including Stable Diffusion, Llama, Flux, Whisper, and more.

Yes. Add your Replicate API token in the Bifrost Web UI to track usage, enforce budgets with virtual keys, and manage providers from one gateway.

View usage in the Replicate dashboard. For per-team or per-app tracking across providers, route requests through Bifrost.

Yes. Webhooks notify your application when long-running predictions finish, so you do not need to poll the API.