When I first shipped my Retrieval-Augmented Generation app it felt like a magic trick. A user typed a question, semantic search surfaced the right document chunks, and Mistral LLM wove them into a coherent answer. That was enough — until it wasn't.
As query complexity grew, three cracks appeared: a single agent trying to do everything lost context on multi-step tasks, vector search alone missed the web of relationships hiding across documents, and plain Markdown responses made rich data feel lifeless. Here is how I rebuilt the entire stack to fix all three.
01 —1. Why Basic RAG Hits a Ceiling
Standard RAG is a retrieval shortcut layered on top of an LLM. It excels at single-hop, semantic questions — "What does our leave policy say about parental leave?" — but starts to wobble when queries require:
- Multi-step reasoning — aggregate a dataset and cross-reference a policy in one shot.
- Relational awareness — "Who leads the department that owns the remote-work policy?" spans entities that vector cosine similarity cannot bridge.
- Task specialization — code review, chart generation, and SQL execution all demand different tools; cramming them into one agent creates a context-length disaster.
Recognising these limits was the architectural turning point. The solution wasn't to swap the model — it was to rethink the entire system.
02 —2. Building a Supervisor–Worker Multi-Agent System
The fix was to stop thinking in "one agent, many tools" and start thinking in teams. I implemented a Supervisor–Worker architecture using LangGraph, where a lightweight orchestrator agent delegates to purpose-built worker agents.
Every incoming user message goes to the Supervisor Agent. It classifies intent, breaks the request into sub-tasks if necessary, and dispatches each sub-task to the right specialist:
- DataAnalyzerAgent — writes and executes Python / SQL against structured datasets, returns typed results.
- SupportAgent — handles knowledge-base retrieval, drafts responses grounded in retrieved documents.
- CodeReviewAgent — reads code snippets, runs static checks, and produces structured review objects.
The Supervisor collects each worker's output, aggregates into a coherent reply, and streams back to the user. If a worker stalls or errors, the Supervisor can retry with a fallback agent — something a monolithic tool-caller can never do cleanly.
Before: a user asking to "summarise Q3 sales and check whether the data privacy policy covers the new data residency requirement" would time out or hallucinate connections the single agent couldn't make.
After: the Supervisor fires DataAnalyzerAgent for the sales summary in parallel with SupportAgent for the policy check. Both return in under four seconds. The Supervisor stitches the answer.
03 —3. Elevating Retrieval with GraphRAG + Neo4j
Even with smarter agents, the SupportAgent's retrieval was still vector-only. That meant relational queries like "Which department owns the remote-work policy, and who is its director?" would surface the policy document but miss the organizational data living elsewhere.
I added a Neo4j Knowledge Graph as a second retrieval channel. At ingestion time, documents are parsed for named entities — people, departments, policies, projects — and the relationships between them are stored as graph edges. The retrieval pipeline now runs two queries in parallel:
- Vector search (Chroma + HyDE) — finds semantically similar document chunks.
- Graph traversal (Cypher) — follows entity relationships to pull connected facts.
Both result sets are merged into a single, enriched context block passed to the LLM. The model now gets semantic proximity and structured relational facts — dramatically reducing hallucination on anything requiring cross-document reasoning.
On a test set of 120 relational questions across 40 internal documents, GraphRAG improved exact-match accuracy from 61% → 89% compared to vector-only retrieval. The biggest gains were on multi-hop questions — exactly the category basic RAG fails hardest.
04 —4. Replacing Markdown with Generative UI
The third pillar was the interface. Users asking for a financial breakdown were getting a wall of Markdown tables. The AI was smart; the presentation wasn't.
Using the Vercel AI SDK's streaming UI primitives, I extended the agent responses beyond plain text. Instead of returning a string, agents can now trigger tool calls that map to React components rendered directly in the chat window:
- render_chart(data, type) → streams an interactive Recharts bar / line graph, hover-enabled, theme-matched.
- render_table(rows, columns) → renders a sortable, paginated data table with column-level filtering.
- render_form(schema) → generates a type-safe settings form the user can submit to trigger follow-up actions.
- render_code(snippet, lang) → syntax-highlighted code block with a one-click copy button.
The LLM decides which component to use based on the nature of the answer. If the data is temporal, it picks a line chart. If it's comparative, a bar chart. Plain factual answers still return as prose — no unnecessary chrome.
Before: "Show me Q3 sales by region" returned a 200-line Markdown table nobody wanted to scroll.
After: A colour-coded bar chart loads inline, tooltip on hover, total at the footer. Time-to-insight drops from minutes to seconds.
05 —5. Putting It All Together
The three upgrades are not independent features — they form a layered stack:
- The Supervisor–Worker system decides who handles each part of a request.
- GraphRAG ensures each agent gets richer, more accurate context.
- Generative UI ensures the final answer is rendered in the most useful form for the user.
Strip any one layer out and the experience degrades noticeably. Together, they turn a capable research tool into something that genuinely feels like bespoke enterprise software.
06 —Key Takeaways
- Architecture > model choice. Switching from GPT-4 to Claude changes outputs at the margin. Switching from a monolithic agent to a multi-agent system changes what is even possible.
- Vector search is necessary, not sufficient. The moment your data has meaningful relationships, you need a graph alongside the embeddings.
- UI is part of the AI product. A beautiful answer formatted poorly is a bad answer. Generative UI closes that gap.
- Parallelism is free performance. Running DataAnalyzer and SupportAgent concurrently roughly halved response latency on complex queries — no hardware changes needed.
The RAG app I started with was a proof of concept. What it became is an autonomous, enterprise-grade assistant — one that can reason across silos, traverse knowledge graphs, and present its findings in a form users actually want to use.
If you're hitting the ceiling on basic RAG, the answer isn't a better prompt. It's a better architecture.
Let's build smart. Let's build together.
— Gopal