Adding a Qwen-powered Memory-Augmented Agent to the NaLog Platform

Four years ago I wrote about revolutionizing rice farming with IoT AWD at KhawTECH. The hardware worked. The LoRaWAN sensors worked. The platform worked. Farmers could see water levels on a phone. What we still didn’t have was the thing that turns data into remembered, field-specific advice… you know, the agronomist who knows that *this* paddy drains faster after re-levelling, that *this* farmer wants to approve the pump himself near flowering, that *last* season AWD here cut pumping by roughly a third with no yield loss, etc.

When the Alibaba Cloud hackathon opened the MemoryAgent track, it felt less like a side project and more like the missing layer of a platform I’ve been shipping in the field for years. NaLog is already proving useful in poor farming communities, so I joined the hackathon in hopes of winning further funding to scale what’s working, not to prototype something from scratch. So I built NaLog Agent: a Qwen-powered Memory-Augmented Agent on top of the live NaLog / KhawTECH IoT platform, deployed on Alibaba Cloud, and open-sourced under MIT.

This is the story of how I built it (and what I learned putting Qwen to work on a real agritech product, not a demo).

Sensors are working. It’s the memory that’s missing.

Smallholder farmers can’t afford big ag-tech. KhawTECH puts affordable AWD (Alternate Wetting and Drying) sensors in their fields. The sensors produce data. Raw data isn’t advice.

A generic chatbot fails here for two reasons:

Grounding: Irrigation advice has to be based on actual water levels, growth stage, and AWD phase. Inventing a number is worse than saying nothing.
Memory: Good guidance is hyper-local and cumulative. It has to survive across sessions and seasons, and forget what no longer matters.

That’s the Memory-Augmented Agent problem in a nutshell. And it’s exactly what I’ve been thinking about since I first wrote about Vector RAG and agentic search (not as an abstract retrieval debate, but as a practical constraint for rural deployments with limited bandwidth and small context windows).

What NaLog Agent does

A farmer asks in Thai or English something like “Does Paddy 3 need pumping now?” (or just attaches a photo of the field), and the agent goes brrrrr:

Reads live sensor data from the NaLog REST API (paddies, AWD cycles, water levels, trends).
Recalls relevant past experience for that farmer and that field.
Explains a recommendation grounded in real numbers, streamed live over SSE (thinking, tool calls, and the reply token by token).
Proposes pump actions for human approval and never switches a pump on its own. Only after approval does a LoRaWAN downlink go to the field via ChirpStack.

And it doesn’t just wait to be asked. A sensor crossing a threshold POSTs to /api/alerts and the agent runs a full reasoning turn unprompted: verifies the alert against live data, recalls the farmer’s history, and prepares a pump proposal on its own — with actuation still behind human approval.

Behind the scenes it runs a bounded ReAct tool loop (up to 6 rounds), persists a 3-tier memory system, and autonomously extracts durable learnings after each turn using a cheap qwen3.6-flash pass.

The same tool handlers also run as an MCP server, so any MCP client can read field state, query memory, and prepare irrigation proposals (one implementation, two surfaces).

Demo video

The ~3-minute walkthrough shows a Thai irrigation query grounded in live sensor data and recalled memory, human-in-the-loop pump approval with a LoRaWAN downlink, cross-session memory in a new conversation, and proof of the backend running on Alibaba Cloud Function Compute. Watch on YouTube if the embed doesn’t load.

Architecture on Alibaba Cloud

The full Alibaba Cloud stack:

Service	Role
Model Studio (DashScope)	Qwen reasoning, tool calling, Thai/English NLG, vision
Model Studio (embeddings + rerank)	`text-embedding-v3` for semantic memory, `qwen3-rerank` cross-encoder for recall re-ordering
Tablestore	Profile, episodic memory (with TTL), sessions, HITL proposals
DashVector	Top-K semantic recall of past field experience
Function Compute 3.0	Serverless backend (ZIP custom runtime, no container registry)

I wrote about testing Model Studio back in 2024 and about running Qwen in Cursor earlier this year. NaLog Agent uses the same OpenAI-compatible DashScope endpoint I’ve been using everywhere else:

https://dashscope-intl.aliyuncs.com/compatible-mode/v1

The difference here is how the models are tiered:

qwen3.6-flash: cheap post-turn memory extraction (thinking off)
qwen3.6-plus: composing the farmer-facing reply (thinking off — no reasoning tax on prose)
qwen3.7-max: the agronomic reasoning loop with tool calling (thinking on, streamed to the UI)
qwen3-vl-plus: reading field photos — crop condition, water visibility, pest damage
text-embedding-v3 + qwen3-rerank: episodic memory vectors and recall re-ranking

Hybrid thinking control turned out to be one of the quiet wins of the Qwen3 generation: pay for chain-of-thought exactly where it earns its tokens (tool-use decisions), skip it where it doesn’t (writing two friendly sentences in Thai). On a product where every token costs real money for farmers who can’t afford waste, that tiering matters.

The memory model

Three tiers, deliberately designed for the MemoryAgent track:

Tier	Store	Behaviour
Profile (sticky)	Tablestore	Durable facts: language, irrigation style, preferences
Episodic (decaying)	Tablestore + DashVector	Dated field experience; TTL ~400 days; reinforced on reuse
Semantic recall	DashVector	Embed the situation, return the few most similar memories

Recall is vector-first: DashVector returns the top candidates, Tablestore hydrates them with a single BatchGetRow (point lookups, no scans — O(topK) whatever the history size), and a qwen3-rerank cross-encoder re-orders them against the actual query. Then relevance blends three signals:

score = 0.60 · semantic_rank    (reranked)
      + 0.25 · recency          (half-life ≈ 120 days)
      + 0.15 · reinforcement    (min(reuse_count / 5, 1))

Forgetting is twofold: old memories sink in ranking (soft), and Tablestore TTL physically deletes episodic rows after ~400 days unless they’re rewritten (hard). Recall is deliberately top-K (default 5) and summarised into a compact block — never a full memory dump. That’s what makes it viable on slow rural connections with a small context window.

I didn’t want to just claim the forgetting works, so I benchmarked it (BENCHMARK.md, reproducible offline with npm run bench): 42 memories across two seasons where every current fact has a stale near-duplicate from last season — the case where plain vector search actively misleads. Pure vector search: 91.7% Recall@5 but let 8 stale twins into the context window. The production 3-tier blend: 100% Recall@5, zero cases where an outdated fact outranked the current one. Last season’s trigger level ranking above this season’s is exactly how an agent gives confidently wrong advice.

This is the part where my Vector RAG experience actually paid off. DashVector returns cosine distance; the local dev driver returns similarity. Ranking by position rather than raw score made recall robust across both environments.

Building on a real platform

This is not a mock dataset dressed up as agritech.

NaLog Agent connects read-only to the live NaLog API, which is the same platform that powers the KhawTECH farming frontend. There is no aggregated “paddy status” endpoint, so the agent stitches paddy + AWD cycle + sensors + latest readings the way an agronomist would mentally compose the picture.

When NALOG_USE_DEMO=true, anyone can run the agent locally with a bundled demo farm and only a DashScope API key. When pointed at production, it reads real sensor data from fields we’ve been deploying since my first KhawTECH posts.

Safety is architectural, not prompt-only. The LLM has no “turn pump on” tool. Pump actions go through propose_irrigation → farmer approval → ChirpStack downlink (0x01 on / 0x00 off). The agent can recommend; the human decides.

Identity is real too: the platform forwards each farmer’s Firebase ID token, and the agent verifies it server-side — RS256 signature against Google’s public certs plus issuer/audience/expiry checks, implemented with plain node:crypto (no firebase-admin, which is a heavy dependency for a serverless function that only needs to verify). Memory and proposals are scoped to the verified farmer; you cannot approve someone else’s pump.

Deploying on Function Compute

I’ve been deploying on Function Compute since 2018. For this project I went with FC 3.0’s ZIP-based custom runtime (custom.debian10, bundled Node 20) rather than a GPU container. As inference runs on Model Studio, the function itself is a lightweight Express 5 service orchestrating tools and memory.

The deploy flow:

npm run provision      # Tablestore tables (with TTL) + DashVector collection
npm run deploy:build   # Linux node_modules via Docker → dist/nalog-agent-fc.zip
npm run deploy:fc      # CreateFunction / UpdateFunction + HTTP trigger

If you want the deep dive on FC itself, I covered GPU-based Qwen hosting in a separate post. NaLog Agent takes the opposite path: tiny ZIP function, heavy lifting on managed Model Studio.

What surprised me

Grounding beats fluency. Tool-calling over live sensor data, with explicit “never invent water levels” rules, produces trust in a domain where a wrong pump decision costs diesel, yield, or both.

Memory is the product unlock, not the chat UI. Farmers don’t need another bot. They need something that remembers their fields and gets cheaper to run over time.

Forgetting is as important as remembering. Without TTL and recency decay, episodic memory becomes noise within a season.

Model tiering is an economic feature. Per-turn token reporting in the API response makes the cost visible. On a platform for people who count diesel litres, that transparency matters.

Streaming is not cosmetic. A deep qwen3.7-max reasoning turn over live tools takes ~30 seconds. Behind a spinner that’s a broken product — real farmers on the production platform saw “Thinking…” and gave up. Streaming the same turn over SSE (thinking deltas, tool calls as they fire, then the reply token by token) puts the first event in the browser in under a second. Nothing about the model changed; everything about the experience did.

Don’t trust the model with rules you can enforce in code. My prompt said “mirror the farmer’s language”. Worked in every fresh-session test — then a farmer with a Thai conversation history asked something in English and got Thai back: with enough Thai in the context window, the polite instruction lost. The fix was ten lines of deterministic script detection (Thai Unicode range → Thai, Latin → English) injected as a hard directive per turn. If a rule matters, detect it in code and tell the model, don’t ask it to notice.

LLMs narrate instead of acting. In early alert-webhook tests the agent would say “I’ve prepared a pump proposal for you” without ever calling propose_irrigation. The cure was state-aware prompting: the compose step is told explicitly whether a proposal was actually created this turn, and instructed never to claim otherwise. Trust, but verify — against your own tool trace.

Open source

The project is MIT-licensed and on GitHub. Docker quick-start, demo mode, swappable storage drivers (local for dev, alibaba + dashvector for production), 110 deterministic tests, a reproducible memory benchmark, smoke scripts, and full docs included.

This isn’t a hackathon artifact that dies when the judging ends. It’s running in production for the same KhawTECH farmers I wrote about four years ago — and if it wins, the prize goes straight into more pumps, solar panels and sensors for families in Isan who couldn’t afford them otherwise.