Running AI models locally gives you complete control, privacy, and zero dependency on internet connectivity. This guide walks you through setting up Open WebUI with Ollama—a ChatGPT-like interface running entirely on your machine.
What You’ll Need
- Docker Desktop
- Ollama (local LLM runtime)
- About 15 minutes
- At least 8GB of free disk space for models
Understanding the Architecture
Before we dive in, here’s how the components work together:
The Flow:
- You interact with Open WebUI in your browser (the polished ChatGPT-like interface)
- Open WebUI runs inside Docker Desktop (handles all dependencies automatically)
- Docker communicates with Ollama running on your host machine (the AI engine)
- Ollama loads models from your hard drive and processes them using your GPU for speed
Why not just use Ollama’s built-in interface? Ollama does have a simple command-line interface and basic web UI, but it’s minimal—think of it as the engine without the dashboard. Open WebUI provides the full-featured experience: conversation history, document uploads, model switching, multi-user support, and more.
Step 1: Install Docker Desktop
Docker provides the containerized environment for Open WebUI.
- Download Docker Desktop: https://docs.docker.com/desktop/
- Install and launch the application
- Verify installation by opening a terminal and running:
docker --version
Step 2: Install Ollama
Ollama runs the AI models locally on your machine.
- Download Ollama: https://ollama.com/download
- Install for your operating system
- Verify installation:
ollama --version
Step 3: Download an AI Model (Critical Step!)
You must download at least one model before launching Open WebUI. Without a model, Open WebUI will have nothing to talk to.
Download a model using Ollama:
ollama pull llama3.2
This downloads the Llama 3.2 model (about 4-7GB depending on the variant). Other popular options:
- ollama pull llama3.2:1b(smaller, faster, less capable)
- ollama pull mistral(alternative model)
- ollama pull codellama(optimized for coding)
You can download multiple models and switch between them in Open WebUI.
Step 4: Launch Open WebUI
Run this single command to start Open WebUI:
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main
What this does:
- -p 3000:8080– Maps port 3000 on your machine to the container
- --add-host=host.docker.internal:host-gateway– Enables communication with Ollama on your host machine
- -v open-webui:/app/backend/data– Creates a persistent Docker volume for your data
- --restart always– Automatically restarts the container if it stops
Where is my data stored? Open WebUI stores all your conversations, settings, and uploaded documents in a Docker volume called open-webui. This data persists across:
- Container restarts and updates
- Docker Desktop restarts
- Computer restarts
Your data is safe as long as you don’t delete the Docker volume itself. To find the exact location:
docker volume inspect open-webui
On most systems, Docker volumes are stored in:
- Mac: ~/Library/Containers/com.docker.docker/Data/vms/0/
- Windows: \\wsl$\docker-desktop-data\data\docker\volumes\
- Linux: /var/lib/docker/volumes/open-webui/
Step 5: Access Your Interface
Open your browser and navigate to:
http://localhost:3000
Create an account (stored locally in the Docker volume) and start chatting with your AI models. You’ll see your downloaded Ollama models available in the model selector.
Why This Setup?
For Program Managers:
- No vendor lock-in: Full control over your AI infrastructure
- Data privacy: Everything stays on your machine—conversations, documents, all of it
- Cost predictable: No usage-based pricing
- Offline capable: Works without internet after initial setup
- Customizable: Add models and adjust configurations as needed
- Full-featured UI: Unlike Ollama’s basic interface, Open WebUI offers document uploads, conversation management, and team collaboration features
GPU Acceleration
Ollama automatically detects and uses your GPU if available:
- NVIDIA GPUs: Uses CUDA (10-100x faster than CPU)
- Apple Silicon: Uses Metal acceleration
- AMD GPUs: Uses ROCm
You’ll see dramatically faster response times with GPU acceleration. Check Ollama’s logs to confirm GPU usage:
ollama serve
Troubleshooting
“No models available” in Open WebUI?
You forgot to download a model! Run ollama pull llama3.2 and refresh Open WebUI.
Can’t connect to Ollama?
Ensure Ollama is running. On most systems it runs automatically, but you can start it manually with ollama serve.
Port 3000 already in use?
Change -p 3000:8080 to -p 3001:8080 (or any available port) and access at localhost:3001.
Container won’t start? Check Docker Desktop is running and you have sufficient disk space.
Want to back up your conversations?
Export the Docker volume: docker run --rm -v open-webui:/data -v $(pwd):/backup ubuntu tar czf /backup/open-webui-backup.tar.gz /data
This setup provides a production-ready local AI assistant in under 15 minutes—no cloud dependencies required.



