Install Ollama, download open-source models, start your local LLM server, then integrate with Bifrost for private, cost-free inference with multi-provider failover. Complete in 10 minutes.
Bifrost supports local Ollama instances through REST API endpoints. Run private, cost-free inference with thousands of open-source models directly on your machine.
| Property | Details |
|---|---|
| Description | Ollama enables local LLM inference with no API keys, no cloud costs, and complete privacy. Download models and run them on your machine. |
| Provider route on Bifrost | ollama/<model> |
| Provider doc | Ollama on GitHub |
| API endpoint for provider | http://localhost:11434 |
| Supported endpoints | /v1/models, /v1/completions, /v1/chat/completions, /v1/responses, /v1/embeddings |
Use these resources for downloads, documentation, and model information.
Before you begin, you will need:
[ QUICK START ]
Get Ollama for your operating system.
Go to ollama.ai and download the installer for macOS, Linux, or Windows.
Open a terminal and run a pull command to download a model.
$ ollama pull llama2
The server runs locally on port 11434.
Ollama starts the server automatically, or you can run it explicitly:
$ ollama serve
Enable access from other machines if needed.
By default, Ollama listens only on localhost. To allow network access, set the OLLAMA_HOST environment variable:
$ export OLLAMA_HOST=0.0.0.0:11434 $ ollama serve
Generate text using your local model.
Call your local Ollama instance:
$ curl http://localhost:11434/api/generate \ -d '{ "model": "llama2", "prompt": "Hello from local LLM!" }'
[ MODELS ]
| Model | API ID | Best for |
|---|---|---|
| Llama 3.3 | llama3.3 | Latest Meta Llama for local chat. |
| Llama 3.2 | llama3.2 | Multimodal-capable Llama 3.2. |
| Llama 3.1 | llama3.1 | Stable Llama 3.1 family. |
| Mistral | mistral | Compact Mistral 7B locally. |
| Qwen 2.5 | qwen2.5 | Strong open Qwen for coding. |
| DeepSeek R1 | deepseek-r1 | Local reasoning model. |
| Gemma 2 | gemma2 | Google Gemma locally. |
| Phi-3 | phi3 | Small Microsoft Phi models. |
| Codestral | codestral | Code-focused Mistral model. |
| nomic-embed-text | nomic-embed-text | Local embeddings for RAG. |
Models and availability change over time. See the Ollama model library for the latest list and pricing.
[ TROUBLESHOOTING ]
| Issue | Likely Cause | What to Do |
|---|---|---|
Connection refused | Ollama server not running. | Start the Ollama server with `ollama serve` or check it's running in the system tray. |
Out of memory | Model is too large for your system. | Use a smaller model like `neural-chat` or add more RAM to your system. |
Model not found | Model hasn't been downloaded yet. | Run ollama pull <model> to download the model before using it. |
GPU not detected | GPU drivers not installed or not supported. | Install NVIDIA or AMD drivers. Ollama will fall back to CPU inference if GPU unavailable. |
[ PRODUCTION-READY ]
Bifrost is a drop-in replacement for Ollama SDKs. Update your base URL and keep your client code. Bifrost handles virtual keys, cost tracking, and failover to cloud providers.
Run the Bifrost gateway and configure your local Ollama instance in the Web UI.
$ npx -y @maximhq/bifrost
✓ Bifrost started ├─ HTTP server listening on http://localhost:8080 ├─ Web UI available at http://localhost:8080 └─ Configure providers and virtual keys in the dashboard
Update your OpenAI SDK to route through Bifrost's unified gateway.
from openai import OpenAI client = OpenAI( api_key="sk-bf-your-virtual-key", base_url="http://localhost:8080/openai" ) response = client.chat.completions.create( model="ollama/llama2", messages=[{"role": "user", "content": "Hello from Bifrost!"}] ) print(response.choices[0].message.content)
x-bf-vk or Authorization: Bearer sk-bf-* per the Bifrost documentation.After local Ollama is set up, add cloud providers like Groq or Mistral for intelligent failover.
[ WHAT'S NEXT ]
You have your API key. Add governance, guardrails, and MCP controls for production.
[ BIFROST FEATURES ]
Everything you need to run AI in production, from free open source to enterprise-grade features.
01 Governance
SAML support for SSO and Role-based access control and policy enforcement for team collaboration.
02 Adaptive Load Balancing
Automatically optimizes traffic distribution across provider keys and models based on real-time performance metrics.
03 Cluster Mode
High availability deployment with automatic failover and load balancing. Peer-to-peer clustering where every instance is equal.
04 Alerts
Real-time notifications for budget limits, failures, and performance issues on Email, Slack, PagerDuty, Teams, Webhook and more.
05 Log Exports
Export and analyze request logs, traces, and telemetry data from Bifrost with enterprise-grade data export capabilities for compliance, monitoring, and analytics.
06 Audit Logs
Comprehensive logging and audit trails for compliance and debugging.
07 Vault Support
Secure API key management with HashiCorp Vault, AWS Secrets Manager, Google Secret Manager, and Azure Key Vault integration.
08 VPC Deployment
Deploy Bifrost within your private cloud infrastructure with VPC isolation, custom networking, and enhanced security controls.
09 Guardrails
Automatically detect and block unsafe model outputs with real-time policy enforcement and content moderation across all agents.
[ SHIP RELIABLE AI ]
Change just one line of code. Works with OpenAI, Anthropic, Vercel AI SDK, LangChain, and more.
[ FAQ ]
Yes, Ollama is completely free and open source. Run it on your local machine with no cloud costs or subscription fees.
Ollama supports thousands of open-source models including Llama 2, Mistral, Neural Chat, Code Llama, and others. Download models from the Ollama library.
No, Ollama runs locally and requires no API key. You authenticate via localhost by default, or configure basic auth if needed.
Ollama runs on macOS, Linux, and Windows. Recommended: at least 8GB RAM for small models. Larger models require more memory and GPU support is optional but recommended.
Yes, with Bifrost. Bifrost provides OpenAI-compatible routing for Ollama models, allowing you to use standard SDKs with local LLMs.
Bifrost connects to your local Ollama instance, providing virtual keys, cost tracking, and multi-provider failover. Run Ollama locally and configure Bifrost to route to it.