Vector RAG? Agentic Search? Why Not Both?

- 26 February 2025 - 13 mins read

A Hacker News thread got popular around AI coding assistants and context retrieval strategies. The discussion started by Salvatore Sanfilippo (antirez, the creator of Redis) commenting on Claude’s approach:

One of the silver bullets of Claude, in the context of coding, is that it does NOT use RAG when you use it via the web interface. Sure, you burn your tokens but the model sees everything and this let it reply in a much better way. Is Claude Code doing the same and just doing document-level RAG, so that if a document is relevant and if it fits, all the document will be put inside the context window? I really hope so! Also, this means that splitting large code bases into manageable file sizes will make more and more sense.

Boris Cherny, a Claude Code engineer, responded:

Claude Code doesn’t use RAG currently. In our testing we found that agentic search out-performed RAG for the kinds of things people use Code for.

The thread that follows is just a battleground. Some argued that agentic search burns tokens unnecessarily. Others defended semantic search as essential for understanding large codebases.

What I think myself?

Here’s what I’ve learned after building production RAG systems for over two years and working with vector databases and semantic search for more than a decade: this isn’t a binary choice. The best approach is to use both approaches, and understanding when to use each is what is worth to talk about.

The False Dichotomy

antirez’s observation about Claude web interface is spot-on: when the model “sees everything”, it can reason better. No chunking artifacts, no missing context, no semantic search errors. Just raw, complete information.

But here’s the sad reality: not all codebases fit in a context window. And that’s where the real engineering challenge begins.

The reality is that both approaches solve different problems, and the best tools (like IntelliJ AI Assistant and Cursor) leverage both to optimize for the AI’s context window and the developer’s intent. This isn’t theoretical for me, I’ve built RAG systems that handle millions of documents, and I’ve seen firsthand where each approach excels and where it fails.

His point about “splitting large codebases into manageable file sizes” is true, but this is just good software architecture anyway. But, again, the sad reality is that not every codebase is a greenfield project with perfect file organization. Sometimes you inherit a 500k-line monolith. Sometimes you’re debugging interactions across several repositories or microservices. Sometimes the relevant context is scattered across documentation, comments, and code. That’s when you need more sophisticated retrieval.

bcherny’s point that “agentic search out-performed RAG” isn’t wrong either, it’s context-dependent. From my decade of working with search systems, there are scenarios where precise, literal string matching is exactly what you need.

When I’m hunting down every usage of a specific function name (say, calculateTotalPrice), I don’t want semantic “close-enough” matches. I want every single literal occurrence. Grep nails this. No ambiguity, no false positives from semantically similar but functionally different code. It gives you the complete list in milliseconds.

But let me be clear here, there are scenarios where semantic understanding crushes literal matching. The question “Where do we handle authentication?” doesn’t map to a single keyword. Maybe it’s in AuthService, or UserValidator, or SessionManager, or scattered across middleware and node_modules. Vector RAG understands the concept and surfaces relevant code regardless of naming conventions.

When you’re dealing with legacy code where naming conventions changed over time, or code written by teams that spoke different “dialects” of engineering terminology, vector search becomes essential. When I ask “How does our payment flow work?” I need the AI to pull together multiple files, trace relationships, and understand the bigger picture. Vector embeddings excel here because they encode semantic relationships between code chunks.

And what about documentation? Sometimes the best context isn’t in the code itself, but in comments, README files, or inline documentation. Vector search can surface relevant explanations that grep would miss unless you knew the exact wording.

Adapting to Context Windows

Here’s where my experience building RAG systems taught me something important: the best approach depends on your context window constraints.

antirez’s question about whether Claude Code uses “document-level RAG” hits on something important: if you can fit the entire relevant document in the context window, do it. Don’t chunk. Don’t summarize. Just send the damn whole thing.

But what about when you can’t? What about when you’re working with smaller token context windows?

This is where tools like Cursor shine: they don’t force you to choose. It uses local codebase indexing with vector embeddings for semantic understanding, while still supporting precise symbol search and navigation when you need it. It adapts to context windows:

  • For broad queries: Vector RAG retrieves semantically relevant chunks to fit the context budget
  • For specific lookups: Direct symbol references and grep-style precision
  • For complete context: When a file fits, they send the whole file (antirez’s point)
  • For complex tasks: Hybrid retrieval that combines both approaches

This is why I prefer these hybrid tools over purely grep-based approaches. The context window optimization happens automatically, switching strategies based on what the AI needs, what fits, and what will produce the best result. When done right, hybrid retrieval can be more token-efficient than either pure approach because it returns exactly what’s needed.

The models with larger context windows (like Sonnet’s 200K tokens) give us more room to work with, but that doesn’t mean we should just dump everything in. It means we have more flexibility in choosing the right granularity for each query. I still remember fighting against OpenAI’s 8k token context windows for a while. Building back then for such constraints was a pain, but it made you think about context optimization like nothing else.

My Preferred Workflow

After using various AI assistants, here’s what works for me:

  1. Start with natural language (leverages vector RAG): “How do we handle user sessions?”
  2. Refine with specific symbols (uses agentic search): “Show me all usages of SessionManager.validate
  3. Let the tool decide which retrieval strategy makes sense for follow-ups

Cursor handles this seamlessly. Rather than arguing over which approach is “better”, we should be asking: “Given this specific query, this codebase structure, and this context window, what’s the optimal retrieval strategy?”. This is similar to how I approach fundamentals in the AI era, where you need to understand what’s happening under the hood, but you don’t need to manually orchestrate every detail.

The best tools understand this. They don’t force you to choose between burning tokens for perfect context or losing signal through aggressive chunking. They make intelligent tradeoffs based on what you’re trying to accomplish. My problem with Claude Code is exactly this: it only uses agentic search. No vector RAG. No semantic understanding of your codebase. When you ask it a conceptual question like “Where do we handle authentication?” or “How does our payment flow work?”, it’s limited to literal string matching and file-by-file exploration.

And, honestly, I believe Cursor is just doing a freaking good job at prompt design.


Share: Link copied to clipboard

Tags:

Previous: China and the Art of Perfecting What Works
Next: Why Fundamentals Still Matter in the AI Era

Where: Home > Technical > Vector RAG? Agentic Search? Why Not Both?