Agentic RAG with Context Engineering: Querying Complex Data without Vectors

I want the activity report from last quarter for the top 5 accounts by revenue last year.

Imagine asking that in plain English—and getting an accurate, friendly breakdown in seconds.

Building a ChatGPT-like assistant for enterprise data can feel like magic. But under the hood, making an LLM reliably fetch and aggregate information from a complex platform is tricky. Traditional Retrieval-Augmented Generation (RAG) works great for static text, but struggles when the “knowledge” lives in databases and APIs, not documents. This post explores an agentic RAG approach that forgoes the usual vector database in favor of context engineering—carefully orchestrating what information the LLM sees and when—so it can converse with your data accurately and securely.

The Headache: Why Classic RAG Breaks on Structured Data

Without a different retrieval strategy, you’ll hit these walls:

Not documents: Most enterprise data lives in relational tables and schemas, not articles to embed.
Volume & volatility: Indexing every row into a vector DB is costly and goes stale quickly.
LLMs aren’t SQL engines: Users ask for joins, filters, windows, and aggregates. Similarity search can’t compute them.
Context limits: Shoving raw records into a prompt blows your token budget and hurts quality.
APIs aren’t LLM-friendly: UI-oriented payloads are verbose and noisy for models.
Infra friction: Many teams don’t want a new vector layer just to answer dynamic, computed questions.

Bottom line: classic RAG assumes answers already exist as text. In enterprise, answers are often derived via live queries and aggregation.

The Simple Fix: Agentic Retrieval + Context Engineering

Instead of pre-indexing everything, let the LLM use tools to pull just-in-time data—already shaped for the model.

Two pillars

LLM-Optimized Endpoints (Server-Side) Lean, Well-structured, ACL-aware APIs that return only essential fields (JSON), optionally pre-aggregated where it helps. Predictable keys, stable shapes, low cardinality.
Context-Engineered Agent (Client-Side) An orchestration loop that:

Preloads a concise schema context (entities, key fields, etc).
Plans tool calls and runs them in parallel where possible.
Injects only relevant, compact results into the prompt.
Condenses history as the chat grows to avoid context rot.

Let the database compute; let the LLM reason and present.

Worked Example

User: “Activity report from last quarter for the top 5 accounts by revenue last year.”

Schema cheat-sheet (preloaded in system context)

Entities: concise array of apps with app_name, app_label, description
Fields: minimal metadata per field (id, type, labels, reference)
Choice List Options: normalized lookups (e.g., Contact.status)

Pseudocode for the runtime query shape the agent will synthesize:

{
  "fn": "getRecords",
  "args": {
    "entity": "Account",
    "fields": ["Id", "Name"],
    "order_by": [{ "field": "AnnualRevenue", "direction": "DESC", "fiscal_year": "LAST_YEAR" }],
    "filters": [{ "field": "IsActive", "op": "=", "value": true }],
    "limit": 5
  }
}

LLM call loop & tools

Plan (LLM): Identify entities/tools: Account and activity apps (Email, Note, Meeting, PhoneCall, SMS).
Field metadata (tools, parallel): getFields(Account|Email|Meeting|PhoneCall|SMS) → confirm keys like AnnualRevenue[fiscal_year], CreatedAt, AccountId.
Select accounts (LLM → tool): Compose top-accounts query (as above) → getRecords(Account):

{
  "records": [
    { "Id": "A1", "Name": "Acme Corp" },
    { "Id": "A2", "Name": "Beta Co" },
    { "Id": "A3", "Name": "Cobalt Ltd" },
    { "Id": "A4", "Name": "Delta LLC" },
    { "Id": "A5", "Name": "Everest Inc" }
  ]
}

Fetch activities (LLM → tools, parallel):

[
  {
    "fn": "getRecords",
    "args": {
      "entity": "Email",
      "fields": ["AccountId", "Subject", "Owner", "CreatedAt"],
      "filters": [
        { "field": "AccountId", "op": "IN", "value": ["A1", "A2", "A3", "A4", "A5"] },
        { "field": "CreatedAt", "op": "BETWEEN", "value": ["2025-04-01", "2025-06-30"] }
      ]
    }
  },
  {
    "fn": "getRecords",
    "args": {
      "entity": "Meeting",
      "fields": ["AccountId", "Owner", "CreatedAt"],
      "filters": [
        /* same */
      ]
    }
  },
  {
    "fn": "getRecords",
    "args": {
      "entity": "PhoneCall",
      "fields": ["AccountId", "Owner", "CreatedAt"],
      "filters": [
        /* same */
      ]
    }
  },
  {
    "fn": "getRecords",
    "args": {
      "entity": "SMS",
      "fields": ["AccountId", "Owner", "CreatedAt"],
      "filters": [
        /* same */
      ]
    }
  },
  {
    "fn": "getRecords",
    "args": {
      "entity": "Note",
      "fields": ["AccountId", "Owner", "CreatedAt"],
      "filters": [
        /* same */
      ]
    }
  }
]

Compose answer (LLM): Merge compact JSON blobs → produce a human-friendly report (totals per app, trend vs. prior quarter, notable owners/subjects).

Why this works well

Only 3 LLM calls for reasoning & narration; databases/APIs do the math.
Parallel tools slash latency.
Token-efficient: concise schema + compact JSON > raw tables or long text chunks.

Technical Deep Dive

1) Building LLM-Friendly Data Endpoints

Lean responses: return only what the LLM needs (e.g., { "OpenTasksCount": 42 }).
Predictable JSON: stable keys, low ambiguity.
Optimized queries: indexes, ORDER BY … LIMIT N, and materialized views for hot paths (“top accounts by last year’s revenue”).
ACL-aware: enforce permissions at the data layer so the model never sees unauthorized rows.
Why not existing UI APIs? They’re chatty and UI-shaped. LLM-specific endpoints routinely cut payloads 70–80% and reduce latency.

2) Context Bootstrapping

Seed the session with:

Entity list + one-liners (Account, Contact, Opportunity, Activity types)
Key fields (labels, filterability, typical sort keys)
other necessary schema data

This costs only a few hundred tokens and drives far better tool selection.

3) Step-by-Step Agent Reasoning

Parse intent: last quarter, top 5 by last year revenue, multiple activity types.
Decide tools: getFields → getRecords(Account) → getRecords(Activity*).
Run parallel calls where independent.
Insert compact JSON; truncate or pre-aggregate long lists.
Final LLM pass focuses on explanation/formatting.

4) Parallelization & Efficiency

Cache field metadata and reuse across turns.
Fan-out activity fetches per entity.
Typical pattern: 2–3 LLM calls + N parallel tool calls → chat-like responsiveness.

5) Context Window Management

Track token usage; at ~80% of the model’s limit, summarize early turns and drop raw blobs.
Keep crisp session memory (e.g., { "TopAccountsFY2024": ["A1","A2","A3","A4","A5"] }) to avoid replay.

Benchmarks & Metrics (Qualitative)

Latency: Faster & more predictable by eliminating vector search and compressing payloads. LLM time dominates; APIs return in tens–low hundreds ms.
Accuracy: Answers grounded in current DB facts; numeric work done by the data layer curbs hallucinations.
Token efficiency: Small JSON slices vs. large text chunks → cheaper models, longer coherent sessions.
Scalability: No re-embedding treadmill; scale with your DB, caches, and infra you already operate.

Best Practices & Caveats

Design minimal LLM endpoints.
Preload essential schema context only.
Let tools handle filters/aggregations; LLMs explain and summarize.
Parallelize independent tool calls.
Enforce permissions at the server layer.
Continuously curate context (summarize/trim).

Don’t

Dump entire schemas or tables into prompts.
Rely solely on semantic search for structured queries.
Chain too many sequential LLM calls—plan broadly, then execute.
Ignore token budgets—cap result sizes/fields and prefer aggregates.
Skimp on timeouts, retries, or “not found” branches.

Lessons Learned / Key Takeaways

RAG is evolving: For structured data, combine LLM reasoning with live queries and tools.
Context engineering is king: High-quality answers come from what you show the model and when.
LLMs + Databases = ❤️: Let the DB compute; let the LLM communicate.
Engineering wins: Caching, parallelism, summaries, and API design often beat model tweaks.

Wrapping Up

Simply bolting an LLM onto a vector store isn’t enough for complex enterprise SaaS. If you’re building an AI assistant, the pattern that scales is agentic retrieval + context engineering: let the agent treat data access as a dialogue, pull just-in-time, permissioned facts via tools, keep the model’s context lean, and delegate computation to the systems built for it. Curate the right context, at the right moment, in the right format—and your assistant will be faster, cheaper, and, most importantly, trustworthy.