Code is not a document. Stuffing code into a vector store works up to a point, but answering "who calls this function?" or "which tests break if I change this line?" is completely beyond vector retrieval. combo agent splits "getting code into the knowledge base" into three indexing tiers — how much initialization time you invest determines the depth of questions you can answer.
One Table to Understand All Three Tiers
| Tier | What You Need to Do | When It's Ready | Questions It Can Answer | Who It's For |
|---|---|---|---|---|
| Zero index | Just drop the repo and search | Instant | "Does this repo have authentication code?" "Which files mention this API?" | Ad-hoc debugging, newly inherited code, small repos you won't maintain long-term |
| Light index | Wait for initial sync (minutes, depends on repo size) | Seconds after any file change | "Where is this function defined?" "Who calls it?" "What are all its imports?" | Daily code review, new-hire onboarding, the vast majority of business repos |
| Heavy index | Set up a compilable build environment first (10 min ~ hours, depends on project) | Available after full analysis completes | "Which downstreams are affected if I change this line?" "What does this inheritance hierarchy look like?" "All cross-compilation-unit references" | Legacy C/C++ projects, security audits, cross-module large refactors |
Why Not "One Tier for Everything"
Many products claim "one tier handles everything." Either they're misleading you:
- Not deep enough: only literal grep, can't answer call relationships
- Too slow: every repo needs "half-compilation," initialization takes tens of minutes, and you have to wait again next time
Our judgment: index depth and initialization cost are a trade-off — picking the right tier per scenario is the engineering approach.
Zero Index: Search Right Away
No index built — relies on syntax-level intelligent matching + LLM reading relevant files directly.
Good for:
- "What does this repo mainly do? Give me an overview"
- "Search for code that handles timeouts"
- "I remember there was a file called
payment_*.py, help me find it"
Not good for:
- "Who calls this function" — zero index has no call graph, it can only tell you where the name literally appears
- Cross-language calls, cross-file inheritance inference
Practical advice: If you just want AI to "take a look at this repo," zero index is enough. Its biggest advantage is no waiting.
Light Index: The Daily Workhorse
The platform uses local tools (tree-sitter-based syntax analysis) to build a lightweight index, extracting the locations of functions, classes, variables, and imports for each language, stored in the search engine.
Key Upgrade (M4 / R8+)
- Change one file, new content searchable in seconds — used to take minutes, now instant
- Old "batch wait / batch rebuild" mechanism removed; write code and query immediately
Good for
- "Where is
processOrderdefined and who calls it?" - "What does this class inherit from? What are its subclasses?"
- "How many places in the whole project use
redis.Pipeline?" - "List all external modules imported by this file"
Not good for
- Questions requiring type inference (C++ template instantiation, Python duck-typing runtime types)
- Cross-compilation-unit, cross-link-boundary references
- ASIL-D module "which downstreams are affected if I change this line?"
Heavy Index: Audits / Refactors / Legacy C/C++
Actually "half-compiles" the project, building a fully-typed code graph (based on the SCIP semantic graph). Most expensive and most accurate.
Right Scenarios
| Scenario | Why Heavy Index Is Required |
|---|---|
| Legacy C/C++ projects | Macros, templates, include graphs are complex; without type info, you can't really audit |
| Security audits | Track user input flow to sensitive functions (taint analysis) |
| Cross-module large refactors | Changing a core interface requires seeing all downstreams first |
| Code compliance review | MISRA / ASPICE / 26262 require "complete call chain traceability" |
| ASIL-D safety analysis | SOTIF / FMEA need line-level impact propagation |
Prerequisites
- Your project can build in our environment (CMake / Bazel / Maven, etc.)
- Accept a one-time 10-minute to hour-level initialization
- Project language is in the SCIP support matrix (Java / Python / Go / Rust / TypeScript / C/C++ partial)
Real-World Example
On automotive ECU code, heavy index can answer: "I changed can_send_frame — which CAN buses are ultimately affected, and which ASIL-level requirements do they correspond to?" — light and zero index cannot answer this.
Scenario-to-Tier Recommendations
| Scenario | Recommended Tier | Notes |
|---|---|---|
| Daily PR / MR code review | Light | Second-level response |
| New hire onboarding | Light | Call relationships are sufficient |
| Quick review of an outsourced commit | Zero | Not worth waiting |
| Browsing an open-source repo | Zero | Instant |
| ASPICE compliance audit / BP assessment | Heavy | Needs complete call chains |
| Cross-module large refactor | Heavy | Must see all downstreams |
| Security audit / taint analysis | Heavy | Cross-compilation-unit |
| SOTIF / FMEA impact analysis | Heavy | Line-level precision |
Ragbase Code Index Full Stack (L1–L6)
The sales-facing "zero / light / heavy" tiers are product packaging. The real technical layers are six deep:
| Layer | Index Type | Source | Trigger |
|---|---|---|---|
| L1 | Text chunks + embeddings | Any parser | Always |
| L2 | BM25 full-text inverted index | Any parser | Always |
| L3 | AST chunks | tree-sitter | parser_id=code-aware |
| L4 | R8 keyword fields (functions / classes / imports / fqn) | tree-sitter AST derived | L3 running |
| L5 | audit_findings | ast-grep YAML rules | ast-grep binary available |
| L6 | SCIP semantic graph (index.scip) | scip-java / scip-python / tree-sitter | index_tier=heavy |
| Sales Tier | Actual Layers |
|---|---|
| Zero index | Most L1/L2/L3 not written — falls back to LLM / grep |
| Light index | L1+L2 always on; code-aware adds L3+L4+L5 |
| Heavy index | Light index plus L6 SCIP full graph for supported languages |
Relationship to L1+L2 Dual-Layer Code Review
Our AI Code Review system also uses L1 / L2 — but those are review tiers, not index tiers:
- Code review L1: Objective static scanning (cppcheck / Checkstyle / ruff, etc.)
- Code review L2: Semantic review (LLM + KB + LTM)
- Index L1–L6: How code enters the knowledge base
Don't confuse them. L1/L2 code review can run at any index tier — but L2 review of ASIL-D modules only makes sense paired with heavy index.
FAQ
Q: Can I go straight from zero to heavy index? A: Yes. Index tier is a project-level setting that can be switched online. Switching to heavy triggers a full half-compilation pass.
Q: My repo is very large (1M+ lines). Can heavy index handle it? A: Yes, but initial setup may take several hours. We recommend starting with your core modules (100–500K lines) on heavy, with the rest on light.
Q: Does private deployment support heavy index? A: Yes. SCIP indexing is fully local — no data leaves the customer's network.
Q: How does incremental re-indexing work? A: Light index is truly incremental (file change → seconds). Heavy index uses batch incremental (per commit batch or scheduled full rebuild).
Q: How does this compare to Sourcegraph? A: Sourcegraph has excellent code indexing and we're aligned with it on capabilities (we use the same SCIP toolchain in some scenarios). But combo agent is a full Agent platform combining LLM retrieval + team memory + business rules — not just "code search."
Full capability details: Code Index Three Tiers (docs/concepts/code-index).
Written by
Nox-Lumen Tech-team
Published
May 14, 2026