Try Bifrost Enterprise free for 14 days.

PERFORMANCE FEATURES ENTERPRISE PRICING DOCS BLOG

How to Set Up Ollama

Install Ollama, download open-source models, start your local LLM server, then integrate with Bifrost for private, cost-free inference with multi-provider failover. Complete in 10 minutes.

Free & open sourceLocal inferenceNo API key neededPrivate modelsBifrost gateway

Ollama provider summary

Bifrost supports local Ollama instances through REST API endpoints. Run private, cost-free inference with thousands of open-source models directly on your machine.

Property	Details
Description	Ollama enables local LLM inference with no API keys, no cloud costs, and complete privacy. Download models and run them on your machine.
Provider route on Bifrost	ollama/<model>
Provider doc	Ollama on GitHub
API endpoint for provider	http://localhost:11434
Supported endpoints	/v1/models, /v1/completions, /v1/chat/completions, /v1/responses, /v1/embeddings

Official Ollama Resources

Use these resources for downloads, documentation, and model information.

Prerequisites

Before you begin, you will need:

Mac, Linux, or Windows8GB+ RAM (recommended)GPU support (optional)

Completely free: Ollama is free and open source with no cloud costs, subscriptions, or API keys required.

[ QUICK START ]

How Do You Set Up Ollama in 5 Steps?

Download and install Ollama

Get Ollama for your operating system.

Go to ollama.ai and download the installer for macOS, Linux, or Windows.

Download an open-source model

Open a terminal and run a pull command to download a model.

Terminal (macOS/Linux)

$ ollama pull llama2

Start the Ollama server

The server runs locally on port 11434.

Ollama starts the server automatically, or you can run it explicitly:

Terminal

$ ollama serve

Configure network access (optional)

Enable access from other machines if needed.

By default, Ollama listens only on localhost. To allow network access, set the OLLAMA_HOST environment variable:

Terminal

$ export OLLAMA_HOST=0.0.0.0:11434
$ ollama serve

Make your first API call

Generate text using your local model.

Call your local Ollama instance:

Terminal

$ curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama2",
    "prompt": "Hello from local LLM!"
  }'

[ MODELS ]

Popular Ollama Models

Model	API ID	Best for
Llama 3.3	llama3.3	Latest Meta Llama for local chat.
Llama 3.2	llama3.2	Multimodal-capable Llama 3.2.
Llama 3.1	llama3.1	Stable Llama 3.1 family.
Mistral	mistral	Compact Mistral 7B locally.
Qwen 2.5	qwen2.5	Strong open Qwen for coding.
DeepSeek R1	deepseek-r1	Local reasoning model.
Gemma 2	gemma2	Google Gemma locally.
Phi-3	phi3	Small Microsoft Phi models.
Codestral	codestral	Code-focused Mistral model.
nomic-embed-text	nomic-embed-text	Local embeddings for RAG.

Models and availability change over time. See the Ollama model library for the latest list and pricing.

[ TROUBLESHOOTING ]

Troubleshooting Common Ollama Issues

Issue	Likely Cause	What to Do
`Connection refused`	Ollama server not running.	Start the Ollama server with `ollama serve` or check it's running in the system tray.
`Out of memory`	Model is too large for your system.	Use a smaller model like `neural-chat` or add more RAM to your system.
`Model not found`	Model hasn't been downloaded yet.	Run `ollama pull <model>` to download the model before using it.
`GPU not detected`	GPU drivers not installed or not supported.	Install NVIDIA or AMD drivers. Ollama will fall back to CPU inference if GPU unavailable.

[ PRODUCTION-READY ]

Use Ollama with Bifrost

Bifrost is a drop-in replacement for Ollama SDKs. Update your base URL and keep your client code. Bifrost handles virtual keys, cost tracking, and failover to cloud providers.

Step 1: Start Bifrost and register Ollama

Run the Bifrost gateway and configure your local Ollama instance in the Web UI.

Terminal

$ npx -y @maximhq/bifrost

OUTPUT

✓ Bifrost started
├─ HTTP server listening on http://localhost:8080
├─ Web UI available at   http://localhost:8080
└─ Configure providers and virtual keys in the dashboard

→

Add the Ollama integration pointing to http://localhost:11434 in the Web UI. For details, read Ollama on Bifrost.

Step 2: Point your SDK at Bifrost

Update your OpenAI SDK to route through Bifrost's unified gateway.

example.py

from openai import OpenAI

client = OpenAI(
    api_key="sk-bf-your-virtual-key",
    base_url="http://localhost:8080/openai"
)

response = client.chat.completions.create(
    model="ollama/llama2",
    messages=[{"role": "user", "content": "Hello from Bifrost!"}]
)

print(response.choices[0].message.content)

→

Virtual keys can be sent as x-bf-vk or Authorization: Bearer sk-bf-* per the Bifrost documentation.

Next: Add cloud providers for failover

After local Ollama is set up, add cloud providers like Groq or Mistral for intelligent failover.

Groq Keys

Add ultra-fast cloud inference

Read guide

Hugging Face

Route through cloud models

Read guide

More Guides

View all API key setup guides

View all

[ WHAT'S NEXT ]

Explore Bifrost Resources

You have your API key. Add governance, guardrails, and MCP controls for production.

Access Control

Governance

Virtual keys, budgets, rate limits, routing, and enterprise RBAC with SSO.

Security

Guardrails

PII detection, content moderation, prompt injection defense, and compliance.

MCP

MCP Gateway

High-performance tool execution for AI agents with approvals and audit trails.

View all resources

Ready to Route Ollama Through Bifrost?

Bifrost is open source and production-ready. Get started in minutes with cost tracking, virtual keys, and failover built in.

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os

2from anthropic import Anthropic

4anthropic = Anthropic(

5 api_key=os.environ.get("ANTHROPIC_API_KEY"),

6 base_url="https://<bifrost_url>/anthropic",

9message = anthropic.messages.create(

10 model="claude-3-5-sonnet-20241022",

11 max_tokens=1024,

12 messages=[

13 {"role": "user", "content": "Hello, Claude"}

14 ]

15)

Drop in once, run everywhere.

[ FAQ ]

Frequently Asked Questions

Yes, Ollama is completely free and open source. Run it on your local machine with no cloud costs or subscription fees.

Ollama supports thousands of open-source models including Llama 2, Mistral, Neural Chat, Code Llama, and others. Download models from the Ollama library.

No, Ollama runs locally and requires no API key. You authenticate via localhost by default, or configure basic auth if needed.

Ollama runs on macOS, Linux, and Windows. Recommended: at least 8GB RAM for small models. Larger models require more memory and GPU support is optional but recommended.

Yes, with Bifrost. Bifrost provides OpenAI-compatible routing for Ollama models, allowing you to use standard SDKs with local LLMs.

Bifrost connects to your local Ollama instance, providing virtual keys, cost tracking, and multi-provider failover. Run Ollama locally and configure Bifrost to route to it.

How to Set Up Ollama

Ollama provider summary

Official Ollama Resources

Prerequisites

How Do You Set Up Ollama in 5 Steps?

Download and install Ollama

Download an open-source model

Start the Ollama server

Configure network access (optional)

Make your first API call

Popular Ollama Models

Troubleshooting Common Ollama Issues

Use Ollama with Bifrost

Step 1: Start Bifrost and register Ollama

Step 2: Point your SDK at Bifrost

Next: Add cloud providers for failover

Add ultra-fast cloud inference

Route through cloud models

View all API key setup guides

Explore Bifrost Resources

Governance

Guardrails

MCP Gateway

Ready to Route Ollama Through Bifrost?

Open Source & Enterprise

Try Bifrost Enterprise with a 14-day Free Trial

Drop-in replacement for any AI SDK

Frequently Asked Questions

[ Features ]

[ Resources ]

[ Industries ]

[ Developers ]

[ Company ]