Try Bifrost Enterprise free for 14 days.
Request access

How to Set Up Ollama

Install Ollama, download open-source models, start your local LLM server, then integrate with Bifrost for private, cost-free inference with multi-provider failover. Complete in 10 minutes.

Free & open sourceLocal inferenceNo API key neededPrivate modelsBifrost gateway

Ollama provider summary

Bifrost supports local Ollama instances through REST API endpoints. Run private, cost-free inference with thousands of open-source models directly on your machine.

PropertyDetails
DescriptionOllama enables local LLM inference with no API keys, no cloud costs, and complete privacy. Download models and run them on your machine.
Provider route on Bifrostollama/<model>
Provider docOllama on GitHub
API endpoint for providerhttp://localhost:11434
Supported endpoints/v1/models, /v1/completions, /v1/chat/completions, /v1/responses, /v1/embeddings

Official Ollama Resources

Use these resources for downloads, documentation, and model information.

Prerequisites

Before you begin, you will need:

Mac, Linux, or Windows8GB+ RAM (recommended)GPU support (optional)
i
Completely free: Ollama is free and open source with no cloud costs, subscriptions, or API keys required.

[ QUICK START ]

How Do You Set Up Ollama in 5 Steps?

1

Download and install Ollama

Get Ollama for your operating system.

Go to ollama.ai and download the installer for macOS, Linux, or Windows.

2

Download an open-source model

Open a terminal and run a pull command to download a model.

Terminal (macOS/Linux)
$ ollama pull llama2
3

Start the Ollama server

The server runs locally on port 11434.

Ollama starts the server automatically, or you can run it explicitly:

Terminal
$ ollama serve
4

Configure network access (optional)

Enable access from other machines if needed.

By default, Ollama listens only on localhost. To allow network access, set the OLLAMA_HOST environment variable:

Terminal
$ export OLLAMA_HOST=0.0.0.0:11434
$ ollama serve
5

Make your first API call

Generate text using your local model.

Call your local Ollama instance:

Terminal
$ curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama2",
    "prompt": "Hello from local LLM!"
  }'

[ MODELS ]

Popular Ollama Models

ModelAPI IDBest for
Llama 3.3llama3.3Latest Meta Llama for local chat.
Llama 3.2llama3.2Multimodal-capable Llama 3.2.
Llama 3.1llama3.1Stable Llama 3.1 family.
MistralmistralCompact Mistral 7B locally.
Qwen 2.5qwen2.5Strong open Qwen for coding.
DeepSeek R1deepseek-r1Local reasoning model.
Gemma 2gemma2Google Gemma locally.
Phi-3phi3Small Microsoft Phi models.
CodestralcodestralCode-focused Mistral model.
nomic-embed-textnomic-embed-textLocal embeddings for RAG.

Models and availability change over time. See the Ollama model library for the latest list and pricing.

[ TROUBLESHOOTING ]

Troubleshooting Common Ollama Issues

IssueLikely CauseWhat to Do
Connection refusedOllama server not running.Start the Ollama server with `ollama serve` or check it's running in the system tray.
Out of memoryModel is too large for your system.Use a smaller model like `neural-chat` or add more RAM to your system.
Model not foundModel hasn't been downloaded yet.Run ollama pull <model> to download the model before using it.
GPU not detectedGPU drivers not installed or not supported.Install NVIDIA or AMD drivers. Ollama will fall back to CPU inference if GPU unavailable.

[ PRODUCTION-READY ]

Use Ollama with Bifrost

Bifrost is a drop-in replacement for Ollama SDKs. Update your base URL and keep your client code. Bifrost handles virtual keys, cost tracking, and failover to cloud providers.

Step 1: Start Bifrost and register Ollama

Run the Bifrost gateway and configure your local Ollama instance in the Web UI.

Terminal
$ npx -y @maximhq/bifrost
OUTPUT
 Bifrost started
├─ HTTP server listening on http://localhost:8080
├─ Web UI available at   http://localhost:8080
└─ Configure providers and virtual keys in the dashboard
Add the Ollama integration pointing to http://localhost:11434 in the Web UI. For details, read Ollama on Bifrost.

Step 2: Point your SDK at Bifrost

Update your SDK to route through Bifrost's unified gateway.

example.py
from ollama import Client

# BEFORE
# client = Client(host="http://localhost:11434")

# AFTER: route via Bifrost + virtual key
client = Client(
    host="http://localhost:8080/ollama",
    headers={"Authorization": "Bearer sk-bf-your-virtual-key"}
)

response = client.generate(
    model="llama2",
    prompt="Hello from Bifrost!"
)

print(response.response)
Virtual keys can be sent as x-bf-vk or Authorization: Bearer sk-bf-* per the Bifrost documentation.

Next: Add cloud providers for failover

After local Ollama is set up, add cloud providers like Groq or Mistral for intelligent failover.

Ready to Route Ollama Through Bifrost?

Bifrost is open source and production-ready. Get started in minutes with cost tracking, virtual keys, and failover built in.

[ BIFROST FEATURES ]

Open Source & Enterprise

Everything you need to run AI in production, from free open source to enterprise-grade features.

01 Governance

SAML support for SSO and Role-based access control and policy enforcement for team collaboration.

02 Adaptive Load Balancing

Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.

03 Cluster Mode

High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.

04 Alerts

Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.

05 Log Exports

Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.

06 Audit Logs

Comprehensive logging and audit trails for compliance and debugging.

07 Vault Support

Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.

08 VPC Deployment

Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.

09 Guardrails

Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.

[ SHIP RELIABLE AI ]

Try Bifrost Enterprise with a 14-day Free Trial

[quick setup]

Drop-in replacement for any AI SDK

Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.

1import os
2from anthropic import Anthropic
3
4anthropic = Anthropic(
5 api_key=os.environ.get("ANTHROPIC_API_KEY"),
6 base_url="https://<bifrost_url>/anthropic",
7)
8
9message = anthropic.messages.create(
10 model="claude-3-5-sonnet-20241022",
11 max_tokens=1024,
12 messages=[
13 {"role": "user", "content": "Hello, Claude"}
14 ]
15)
Drop in once, run everywhere.

[ FAQ ]

Frequently Asked Questions

Yes, Ollama is completely free and open source. Run it on your local machine with no cloud costs or subscription fees.

Ollama supports thousands of open-source models including Llama 2, Mistral, Neural Chat, Code Llama, and others. Download models from the Ollama library.

No, Ollama runs locally and requires no API key. You authenticate via localhost by default, or configure basic auth if needed.

Ollama runs on macOS, Linux, and Windows. Recommended: at least 8GB RAM for small models. Larger models require more memory and GPU support is optional but recommended.

Yes, with Bifrost. Bifrost provides OpenAI-compatible routing for Ollama models, allowing you to use standard SDKs with local LLMs.

Bifrost connects to your local Ollama instance, providing virtual keys, cost tracking, and multi-provider failover. Run Ollama locally and configure Bifrost to route to it.