Running LLMs Locally with Docker Model Runner

Docker for LLMs! Docker did it again :_D

There was a time (like, a thousand years ago) when running AI language models locally required specialized knowledge, complex environment setups, and powerful hardware. Those days are officially over. Docker Model Runner has completely changed the game, making it ridiculously (and scarily) simple to run LLMs right on your personal computer. If you’ve been following my work, you might have seen my earlier posts about LLMs in general, but this new Docker feature takes AI accessibility to a whole new level.

What Problem Does Docker Model Runner Solve?

Running large language models locally has traditionally been a nightmare of environment configurations, dependency management, and hardware requirements. Even for experienced developers, the process was time-consuming and error-prone. Good news! Docker has made this complex process shockingly simple.

Running LLMs Locally with Docker Model Runner

The Game-Changer

Docker Model Runner is a feature in Docker Desktop that lets you download and run AI models locally with minimal friction. There’s:

No complex environment setup
No dependency management nightmares
No compatibility issues
No need for specialized hardware (in most cases)

If you have Docker Desktop installed, you’re literally two commands away from having your own ChatGPT-like assistant running locally. This reminds me of how containerization revolutionized application deployment years ago, which I covered in many articles about Docker.

How Easy Is It?

Let me demonstrate with DeepSeek R1 Distill Llama, a powerful and efficient model I’ve been testing on my MacBook Pro M2 Max:

First, pull the model:

docker model pull ai/deepseek-r1-distill-llama:8B-Q4_K_M

Then run it with a simple command:

docker model run ai/deepseek-r1-distill-llama:8B-Q4_K_M 'What are three benefits of running LLMs locally?'

That’s it! Seriously haha. No configuration files, no Python environment setup, no GPU drivers to install. The model downloads in minutes and responds immediately. What would have taken hours of setup and troubleshooting before now happens in seconds.

Why Local LLMs Are a Big Deal

Running AI models locally offers several critical advantages that can’t be overlooked:

Complete privacy: Your data never leaves your device - crucial for sensitive work
No API costs: Use the model as much as you want without worrying about usage fees
Works offline: Perfect for traveling or unreliable internet connections
Lower latency: Responses come faster without network delays
Customization potential: Possibility to fine-tune for your specific needs

This approach to running local AI models follows similar containerization principles that I discussed in my articles about Docker.

The Performance

The speed is what shocked me the most. Although I’m running on a “good” computer, is just a laptop. Using DeepSeek R1 as my test case, the model runs super fast on my MacBook Pro M2 Max. Responses come with minimal lag, making it practical for real-world applications.

I was honestly expecting to deal with the typical tradeoff between accessibility and speed - you know, making something easier to use usually means compromising on capabilities. Not here. The days of needing specialized hardware to run capable AI models are quickly fading.

The response takes less than 500ms in my local machine. Yeah, I’m still processing.

Your Own Local LLM

Here’s how to start running LLMs locally in less than 5 minutes:

Install the latest Docker Desktop
Enable Docker Model Runner in Settings > Features in development
Use docker model pull <model> to download your chosen model
Run prompts with docker model run <model> <prompt>

For more options and available models:

docker model list
docker model help

What’s particularly amazeballs is that Docker Model Runner exposes an OpenAI-compatible API, meaning you can easily integrate these local models into applications already built for services like ChatGPT.

Other Models to Try

While I’ve highlighted DeepSeek R1 in my examples (which has been impressive), Docker Model Runner supports various models through Docker Hub’s ai namespace. Different models offer different capabilities, sizes, and performance characteristics, letting you choose based on your specific needs and hardware constraints.

As an example, ai/llama3.2:1B-Q8_0 occupies 1.22GB and ai/deepseek-r1-distill-llama:8B-Q4_K_M 4.58GB (:70B-Q4_K_M is 42GB).

I’m trully impressed, the barrier to entry for running complex AI has never been lower. As I said, Docker did it again!