Try Bifrost Enterprise free for 14 days.

PERFORMANCE FEATURES ENTERPRISE PRICING DOCS BLOG

How to Get a Fireworks API Key

Create a Fireworks account at app.fireworks.ai, generate your API key, store it securely, then integrate with Bifrost for ultra-low-latency inference with virtual keys, budgets, and cost governance. Complete setup in minutes.

Console & keysBearer authOpenAI compatibleLow-latency inferenceBifrost gateway

Fireworks provider summary

Bifrost supports Fireworks models through OpenAI-compatible HTTP APIs and standard JSON request shapes for ultra-low-latency inference.

Property	Details
Description	Fireworks provides ultra-low-latency LLM inference for chat, reasoning, coding, and image workloads on popular open-source models.
Provider route on Bifrost	fireworks/<model>
Provider doc	Fireworks Documentation
API endpoint for provider	https://api.fireworks.ai/inference/v1
Supported endpoints	/v1/models, /v1/completions, /v1/chat/completions, /v1/embeddings, /v1/images/generations

Official Fireworks Resources

Use these Fireworks-hosted links for console access, API documentation, and authentication details.

Prerequisites

Before you begin, you will need:

Fireworks accountEmail address

Free credits to start: Fireworks provides free credits for testing and development. Add billing in the console when you need higher limits for production workloads.

[ QUICK START ]

How Do You Get a Fireworks API Key in 5 Steps?

Create or sign in to a Fireworks account

Use the Fireworks console.

Go to app.fireworks.ai and sign up or log in with your email address. Verify your account to access the dashboard.

Open API Keys in Settings

After signing in, open your profile menu and select Settings. In the sidebar, click API Keys to view existing keys and create new ones.

Create and copy your API key

Your key is displayed once. Copy it immediately and store it securely.

Click Create API Key, give it a descriptive name (for example development or production), then copy the key. Fireworks will not show the full key again after you leave the page.

→

Optional: Create separate keys per environment so you can revoke or rotate credentials independently.

Store your API key securely

Export as an environment variable so SDKs can read it automatically.

Paste your key into a local environment variable (macOS / Linux):

Terminal (macOS/Linux)

export FIREWORKS_API_KEY="fw_..."

Treat keys like passwords: Never expose API keys in client-side code or commit them to version control. Store in .env files and add to .gitignore.

Make your first Chat Completions call

Authenticate with Bearer tokens per Fireworks' OpenAI-compatible API.

Fireworks' API is OpenAI-compatible and uses Authorization: Bearer FIREWORKS_API_KEY for REST calls:

Terminal

$ curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $FIREWORKS_API_KEY" \
  -d '{
    "model": "accounts/fireworks/models/llama-v3p3-70b-instruct",
    "messages": [{"role":"user","content":"Hello!"}]
  }'

[ MODELS ]

Available Fireworks Models

Model	API ID	Best for
Llama 3.3 70B Instruct	accounts/fireworks/models/llama-v3p3-70b-instruct	Flagship Llama on Fireworks.
Llama 3.1 405B Instruct	accounts/fireworks/models/llama-v3p1-405b-instruct	Largest Llama 3.1 deployment.
Llama 3.1 70B Instruct	accounts/fireworks/models/llama-v3p1-70b-instruct	Production open-weight chat.
Llama 3.1 8B Instruct	accounts/fireworks/models/llama-v3p1-8b-instruct	Fast, economical inference.
Qwen3 235B A22B	accounts/fireworks/models/qwen3-235b-a22b	Large MoE Qwen3 on Fireworks.
DeepSeek V3	accounts/fireworks/models/deepseek-v3	DeepSeek flagship reasoning.
DeepSeek R1	accounts/fireworks/models/deepseek-r1	Chain-of-thought reasoning model.
Mistral Small 3	accounts/fireworks/models/mistral-small-24b-instruct-2501	Efficient Mistral tier.
FLUX.1 Dev	accounts/fireworks/models/flux-1-dev-fp8	Image generation on Fireworks.

Models and availability change over time. See the Fireworks model catalog for the latest list and pricing.

[ TROUBLESHOOTING ]

Troubleshooting Common Fireworks API Errors

Error	Likely Cause	What to Do
`401 Unauthorized`	Invalid or missing API key.	Verify your API key is correct. Generate a new key if needed.
`400 Bad Request`	Invalid request format or unsupported model.	Check request format against OpenAI API reference. Verify model ID.
`429 Rate Limited`	Rate limit exceeded for your plan.	Upgrade your plan or implement exponential backoff. Use Bifrost for intelligent load distribution.
`404 Model Not Found`	Model not found or invalid model ID in request.	Confirm the model path matches the Fireworks catalog (for example `accounts/fireworks/models/...`).
`502/503 Service Error`	Temporary Fireworks service unavailability.	Retry after a delay. Check Fireworks status page. Configure failover with Bifrost.

[ PRODUCTION-READY ]

Use Your Fireworks Key with Bifrost

Bifrost is a drop-in replacement for direct Fireworks calls. Update your base URL and keep your client code. Bifrost handles cost tracking, virtual keys, budgets, and intelligent failover.

Step 1: Start Bifrost and register Fireworks

Run the Bifrost gateway and configure your Fireworks credentials in the Web UI.

Terminal

$ npx -y @maximhq/bifrost

OUTPUT

✓ Bifrost started
├─ HTTP server listening on http://localhost:8080
├─ Web UI available at   http://localhost:8080
└─ Configure providers and virtual keys in the dashboard

→

Add the Fireworks integration in the Web UI. For details, read Fireworks on Bifrost.

Step 2: Point your OpenAI SDK at Bifrost

Update your SDK to route through Bifrost's OpenAI-compatible gateway instead of the direct Fireworks endpoint.

example.py

from openai import OpenAI

# BEFORE
# client = OpenAI(
#     api_key="your-fireworks-key",
#     base_url="https://api.fireworks.ai/inference/v1"
# )

# AFTER: route via Bifrost + virtual key
client = OpenAI(
    api_key="sk-bf-your-virtual-key",
    base_url="http://localhost:8080/openai"
)

response = client.chat.completions.create(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    messages=[{"role": "user", "content": "Hello from Bifrost!"}]
)

print(response.choices[0].message.content)

→

Virtual keys can be sent as x-bf-vk or Authorization: Bearer sk-bf-* per the Bifrost documentation.

[ WHAT'S NEXT ]

Explore Bifrost Resources

You have your API key. Add governance, guardrails, and MCP controls for production.

Access Control

Governance

Virtual keys, budgets, rate limits, routing, and enterprise RBAC with SSO.

Security

Guardrails

PII detection, content moderation, prompt injection defense, and compliance.

MCP

MCP Gateway

High-performance tool execution for AI agents with approvals and audit trails.

View all resources

Ready to Route Fireworks Through Bifrost?

Bifrost is open source and production-ready. Get started in minutes with cost tracking, virtual keys, and failover built in.

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os

2from anthropic import Anthropic

4anthropic = Anthropic(

5 api_key=os.environ.get("ANTHROPIC_API_KEY"),

6 base_url="https://<bifrost_url>/anthropic",

9message = anthropic.messages.create(

10 model="claude-3-5-sonnet-20241022",

11 max_tokens=1024,

12 messages=[

13 {"role": "user", "content": "Hello, Claude"}

14 ]

15)

Drop in once, run everywhere.

[ FAQ ]

Frequently Asked Questions

Fireworks specializes in ultra-low-latency inference with sub-100ms response times through optimized infrastructure and quantized models, while hosting popular open-source models at scale.

Yes. Fireworks provides free credits to get started. Check your account dashboard for current free-tier allocations and usage limits.

Yes. Fireworks provides an OpenAI-compatible API. Use the official OpenAI Python or JavaScript SDKs with base URL https://api.fireworks.ai/inference/v1 and your Fireworks API key.

Fireworks offers popular open-source models including Llama, Mistral, Qwen, DeepSeek, and Code Llama with various parameter sizes. See the Fireworks model catalog for the latest list.

Yes. Bifrost integrates with Fireworks for low-latency inference, cost tracking, virtual keys, budgets, and automatic failover across providers.

Fireworks targets sub-100ms end-to-end latency for many models through its optimized inference engine. Actual latency depends on model size, request shape, and region.