Offline Knowledge with Kiwix Zim and Docker Model Runner

In our super mega friggin increasingly connected world, we take reliable internet access for granted (or even electricity!). But what happens when that connection disappears? Whether you’re in a remote location, experiencing a natural disaster, or simply want to maintain your digital sovereignty (actually my main concern), having access to knowledge without relying on cloud services becomes crucial.

We got used to use AI to answer our questions, and without internet we are lost.

Well, well… This is exactly why I built zim-llm! This is a complete system for creating your own offline knowledge base using compressed Wikipedia/offline content and local, offline LLMs.

The Problem: Digital Fragility

Remote Locations and Travel: When traveling to remote areas (idk… hiking in national parks, sailing offshore, working in rural communities, you name it) internet connectivity can be spotty or nonexistent. Yet, access to reliable information might be more critical than ever in these situations (prepper mindset anyone?).
Emergency Scenarios: Natural disasters, power outages (ahem, Spain…), or cyber incidents can disrupt internet services for long periods. Having a local knowledge base means you can still access critical information about medical emergencies, survival techniques, or technical troubleshooting.
Digital Sovereignty and Privacy: Not everyone wants their queries sent to corporate servers (I don’t). zim-llm runs entirely on your local machine, ensuring your questions and the AI’s responses remain private (It runs offline, in your computer after all).

Chunk strategy: automatic structure-first splitting

Wikipedia-style ZIM articles are just looong HTML pages. You cannot embed a whole article per vector in a useful way (context limits, noisy retrieval), but chopping naively at a fixed character count cuts through sentences and leaves words split in half. zim-llm uses recursive character splitting so splitting is automatic and structure-aware without you picking a “mode” per file. I just thought it was a good idea to use this strategy.

After HTML is stripped to plain text (scripts/styles removed, whitespace normalized), each article is fed to LangChain’s RecursiveCharacterTextSplitter with a priority list of separators: paragraph breaks (\n\n), line breaks (\n), sentence-like breaks (. ), then spaces… and then finally single characters (just keeping the bare minimum). The algorithm keeps pieces under a target size by trying the coarsest separator first and only moving to finer ones when a segment is still too large (so boundaries tend to land on paragraphs and lines before they fall back to mid-line cuts).

Defaults are tuned for offline setups: chunk size 1000 characters and 200 characters of overlap between consecutive chunks (both overridable via config.json). Overlap matters because the “right” answer might sit on a boundary (so I thought a little duplication improves the odds that retrieval returns a self-contained snippet).

Why this style of chunking (and not others)?

Plain fixed-width windows are fast but ignore document structure. They maximize accidental splits mid-entity or mid-definition (and that’s not good).
Token-based splitters tie chunking to one tokenizer and add coupling when the embedder and the chat model are different stacks. Here, character recursion stays simple and portable on modest hardware (and I don’t want to deal with tokenizers).
Semantic / embedding-driven chunkers (split when cosine similarity between adjacent sentences drops) can be super cool but cost extra embedding work at index time and complicate pipelines aimed at long, one-off builds of huge ZIMs (and domestic computers are not exactly the best hardware for this).

Recursive splitting is a good default for encyclopedia HTML: lots of headings and paragraphs map naturally to the separator ladder, and the extra CPU cost is negligible compared to embedding millions of chunks. Again, remember we are using normal computers, not the best hardware for this.

LangChain for the RAG loop (and why LangGraph is not used here)

zim-llm is intentionally a classic retrieve-then-generate pipeline, not an agent with a branching control flow. LangChain is used where it reduces glue code:

RecursiveCharacterTextSplitter for the chunking step described above.
Vector store integrations (Chroma / FAISS) so the same embedding model name stays consistent between indexing and query-time store construction.
PromptTemplate for a single, explicit instruction: use retrieved ZIM context, admit ignorance if context is insufficient (because we are using local LLMs).
RetrievalQA with chain_type="stuff": retrieve top‑k chunks, concatenate them into the prompt, and run one LLM call (Docker Model Runner, Ollama, or a hosted API depending on config).

That is a straight line: query → embeddings → similarity search → stuffed context → answer. There is no cycle of tool calls, no planner revisiting the graph, and no shared mutable state that must be passed between disparate nodes each turn.

LangGraph is built for what they call nodes, edges and a state. It is designed for multi-step, ReAct agents. zim-llm does not implement that pattern as the retrieval is just a single step, and the “tools” will be just “a tool”, the actual retrieval function, not a collection for the LLM to choose from. Using LangGraph would be just overkill. For offline RAG over a static vector index, a thin LangChain chain stays easier to run, inspect, and maintain.

Quick Setup Guide

Getting started with zim-llm is straightforward. Here’s how to set up your offline knowledge base:

1. Install Dependencies

Clone the zim-llm repository and run the setup script:

git clone https://github.com/rouralberto/zim-llm.git
cd zim-llm
./setup.sh

This will create a virtual environment and install all necessary dependencies including:

libzim for reading ZIM files
sentence-transformers for creating embeddings
ChromaDB or FAISS for vector storage
LangChain for the RAG pipeline

2. Add Knowledge Sources

Download ZIM files from the Kiwix Library and place them in the zim_library directory:

# Example: Download and add engineering content
curl -L -o zim_library/engineering.zim "https://download.kiwix.org/zim/libretexts/libretexts.org_en_eng_2025-01.zim"

# Or manually copy files
cp ~/Downloads/*.zim ./zim_library/

3. Build Your Vector Database

Activate the virtual environment and build your knowledge base:

# Activate the virtual environment
source zim_rag_env/bin/activate

# Build the knowledge base
python zim_rag.py build

This process:

Extracts articles from ZIM files
Cleans and chunks the text content
Creates embeddings using sentence-transformers
Stores everything in a vector database for fast retrieval

Note: First-time setup can take several hours for large ZIM files, but subsequent queries are nearly instantaneous.

4. Start Querying

You’re now ready to query your offline knowledge base:

# Simple semantic search
python zim_rag.py query "What is an engineer?"

# Full RAG with AI-generated answers
python zim_rag.py rag-query "Explain Amplitude Quantization"

# Get system information
python zim_rag.py info

Real-World Use Cases

Emergency Preparedness: Imagine a scenario where internet services are down during a crisis. With zim-llm, you can still access medical emergency procedures, water purification techniques, first aid instructions, and disaster response protocols.
Field Research and Exploration: Researchers in remote locations can carry comprehensive knowledge bases covering their field of study without relying on satellite internet.
Education in Low-Connectivity Areas: Students and educators in areas with poor internet can access extensive educational content through local, searchable knowledge bases.
Digital Nomads and Off-Grid Living: Maintain access to reference materials, documentation, and learning resources without monthly data limits or connectivity concerns.

By combining the vast knowledge of projects like Wikipedia with modern AI techniques, we can create tools that work reliably even when disconnected from the cloud.

Whether you’re preparing for emergencies, conducting field research, or simply value your digital privacy, having an offline knowledge base powered by local AI gives you the freedom to learn and discover without boundaries.