Docker for LLMs! Docker did it again :_D
There was a time (like, a thousand years ago) when running AI language models locally required specialized knowledge, complex environment setups, and powerful hardware. Those days are officially over. Docker Model Runner has completely changed the game, making it ridiculously (and scarily) simple to run LLMs right on your personal computer. If you’ve been following my work, you might have seen my earlier posts about LLMs in general, but this new Docker feature takes AI accessibility to a whole new level.
What Problem Does Docker Model Runner Solve?
Running large language models locally has traditionally been a nightmare of environment configurations, dependency management, and hardware requirements. Even for experienced developers, the process was time-consuming and error-prone. Good news! Docker has made this complex process shockingly simple.

The Game-Changer
Docker Model Runner is a feature in Docker Desktop that lets you download and run AI models locally with minimal friction. There’s:
- No complex environment setup
- No dependency management nightmares
- No compatibility issues
- No need for specialized hardware (in most cases)
If you have Docker Desktop installed, you’re literally two commands away from having your own ChatGPT-like assistant running locally. This reminds me of how containerization revolutionized application deployment years ago, which I covered in many articles about Docker.
How Easy Is It?
Let me demonstrate with DeepSeek R1 Distill Llama, a powerful and efficient model I’ve been testing on my MacBook Pro M2 Max:
First, pull the model:
docker model pull ai/deepseek-r1-distill-llama:8B-Q4_K_M
Then run it with a simple command:
docker model run ai/deepseek-r1-distill-llama:8B-Q4_K_M 'What are three benefits of running LLMs locally?'
That’s it! Seriously haha. No configuration files, no Python environment setup, no GPU drivers to install. The model downloads in minutes and responds immediately. What would have taken hours of setup and troubleshooting before now happens in seconds.
Why Local LLMs Are a Big Deal
Running AI models locally offers several critical advantages that can’t be overlooked:
- Complete privacy: Your data never leaves your device - crucial for sensitive work
- No API costs: Use the model as much as you want without worrying about usage fees
- Works offline: Perfect for traveling or unreliable internet connections
- Lower latency: Responses come faster without network delays
- Customization potential: Possibility to fine-tune for your specific needs
This approach to running local AI models follows similar containerization principles that I discussed in my article about Docker.
The Performance
The speed is what shocked me the most. Although I’m running on a “good” computer, is just a laptop. Using DeepSeek R1 as my test case, the model runs super fast on my MacBook Pro M2 Max. Responses come with minimal lag, making it practical for real-world applications.
I was honestly expecting to deal with the typical tradeoff between accessibility and speed - you know, making something easier to use usually means compromising on capabilities. Not here. The days of needing specialized hardware to run capable AI models are quickly fading.
The response takes less than 500ms in my local machine. Yeah, I’m still processing.
Your Own Local LLM
Here’s how to start running LLMs locally in less than 5 minutes:
- Install the latest Docker Desktop
- Enable Docker Model Runner in Settings > Features in development
- Use
docker model pull <model>
to download your chosen model - Run prompts with
docker model run <model> <prompt>
For more options and available models:
docker model list
docker model help
What’s particularly amazeballs is that Docker Model Runner exposes an OpenAI-compatible API, meaning you can easily integrate these local models into applications already built for services like ChatGPT.
Other Models to Try
While I’ve highlighted DeepSeek R1 in my examples (which has been impressive), Docker Model Runner supports various models through Docker Hub’s ai
namespace. Different models offer different capabilities, sizes, and performance characteristics, letting you choose based on your specific needs and hardware constraints.
As an example, ai/llama3.2:1B-Q8_0
occupies 1.22GB and ai/deepseek-r1-distill-llama:8B-Q4_K_M
4.58GB (:70B-Q4_K_M
is 42GB).
I’m trully impressed, the barrier to entry for running complex AI has never been lower. As I said, Docker did it again!