Code Index Three Tiers: When Heavy Indexing is Actually Worth It

Zero / light / heavy index — how much indexing time you invest determines how deep your answers go. Light index handles daily PRs with second-level responses; heavy index is only needed for ASIL-D compliance audits. This post explains which tier fits which scenario.

Back

Code is not a document. Stuffing code into a vector store works up to a point, but answering "who calls this function?" or "which tests break if I change this line?" is completely beyond vector retrieval. combo agent splits "getting code into the knowledge base" into three indexing tiers — how much initialization time you invest determines the depth of questions you can answer.

One Table to Understand All Three Tiers

TierWhat You Need to DoWhen It's ReadyQuestions It Can AnswerWho It's For
Zero indexJust drop the repo and searchInstant"Does this repo have authentication code?" "Which files mention this API?"Ad-hoc debugging, newly inherited code, small repos you won't maintain long-term
Light indexWait for initial sync (minutes, depends on repo size)Seconds after any file change"Where is this function defined?" "Who calls it?" "What are all its imports?"Daily code review, new-hire onboarding, the vast majority of business repos
Heavy indexSet up a compilable build environment first (10 min ~ hours, depends on project)Available after full analysis completes"Which downstreams are affected if I change this line?" "What does this inheritance hierarchy look like?" "All cross-compilation-unit references"Legacy C/C++ projects, security audits, cross-module large refactors

Why Not "One Tier for Everything"

Many products claim "one tier handles everything." Either they're misleading you:

  • Not deep enough: only literal grep, can't answer call relationships
  • Too slow: every repo needs "half-compilation," initialization takes tens of minutes, and you have to wait again next time

Our judgment: index depth and initialization cost are a trade-off — picking the right tier per scenario is the engineering approach.

Rendering diagram…

Zero Index: Search Right Away

No index built — relies on syntax-level intelligent matching + LLM reading relevant files directly.

Good for:

  • "What does this repo mainly do? Give me an overview"
  • "Search for code that handles timeouts"
  • "I remember there was a file called payment_*.py, help me find it"

Not good for:

  • "Who calls this function" — zero index has no call graph, it can only tell you where the name literally appears
  • Cross-language calls, cross-file inheritance inference

Practical advice: If you just want AI to "take a look at this repo," zero index is enough. Its biggest advantage is no waiting.

Light Index: The Daily Workhorse

The platform uses local tools (tree-sitter-based syntax analysis) to build a lightweight index, extracting the locations of functions, classes, variables, and imports for each language, stored in the search engine.

Rendering diagram…

Key Upgrade (M4 / R8+)

  • Change one file, new content searchable in seconds — used to take minutes, now instant
  • Old "batch wait / batch rebuild" mechanism removed; write code and query immediately

Good for

  • "Where is processOrder defined and who calls it?"
  • "What does this class inherit from? What are its subclasses?"
  • "How many places in the whole project use redis.Pipeline?"
  • "List all external modules imported by this file"

Not good for

  • Questions requiring type inference (C++ template instantiation, Python duck-typing runtime types)
  • Cross-compilation-unit, cross-link-boundary references
  • ASIL-D module "which downstreams are affected if I change this line?"

Heavy Index: Audits / Refactors / Legacy C/C++

Actually "half-compiles" the project, building a fully-typed code graph (based on the SCIP semantic graph). Most expensive and most accurate.

Right Scenarios

ScenarioWhy Heavy Index Is Required
Legacy C/C++ projectsMacros, templates, include graphs are complex; without type info, you can't really audit
Security auditsTrack user input flow to sensitive functions (taint analysis)
Cross-module large refactorsChanging a core interface requires seeing all downstreams first
Code compliance reviewMISRA / ASPICE / 26262 require "complete call chain traceability"
ASIL-D safety analysisSOTIF / FMEA need line-level impact propagation

Prerequisites

  • Your project can build in our environment (CMake / Bazel / Maven, etc.)
  • Accept a one-time 10-minute to hour-level initialization
  • Project language is in the SCIP support matrix (Java / Python / Go / Rust / TypeScript / C/C++ partial)

Real-World Example

On automotive ECU code, heavy index can answer: "I changed can_send_frame — which CAN buses are ultimately affected, and which ASIL-level requirements do they correspond to?" — light and zero index cannot answer this.

Scenario-to-Tier Recommendations

ScenarioRecommended TierNotes
Daily PR / MR code reviewLightSecond-level response
New hire onboardingLightCall relationships are sufficient
Quick review of an outsourced commitZeroNot worth waiting
Browsing an open-source repoZeroInstant
ASPICE compliance audit / BP assessmentHeavyNeeds complete call chains
Cross-module large refactorHeavyMust see all downstreams
Security audit / taint analysisHeavyCross-compilation-unit
SOTIF / FMEA impact analysisHeavyLine-level precision

Ragbase Code Index Full Stack (L1–L6)

The sales-facing "zero / light / heavy" tiers are product packaging. The real technical layers are six deep:

LayerIndex TypeSourceTrigger
L1Text chunks + embeddingsAny parserAlways
L2BM25 full-text inverted indexAny parserAlways
L3AST chunkstree-sitterparser_id=code-aware
L4R8 keyword fields (functions / classes / imports / fqn)tree-sitter AST derivedL3 running
L5audit_findingsast-grep YAML rulesast-grep binary available
L6SCIP semantic graph (index.scip)scip-java / scip-python / tree-sitterindex_tier=heavy
Sales TierActual Layers
Zero indexMost L1/L2/L3 not written — falls back to LLM / grep
Light indexL1+L2 always on; code-aware adds L3+L4+L5
Heavy indexLight index plus L6 SCIP full graph for supported languages

Relationship to L1+L2 Dual-Layer Code Review

Our AI Code Review system also uses L1 / L2 — but those are review tiers, not index tiers:

  • Code review L1: Objective static scanning (cppcheck / Checkstyle / ruff, etc.)
  • Code review L2: Semantic review (LLM + KB + LTM)
  • Index L1–L6: How code enters the knowledge base

Don't confuse them. L1/L2 code review can run at any index tier — but L2 review of ASIL-D modules only makes sense paired with heavy index.

FAQ

Q: Can I go straight from zero to heavy index? A: Yes. Index tier is a project-level setting that can be switched online. Switching to heavy triggers a full half-compilation pass.

Q: My repo is very large (1M+ lines). Can heavy index handle it? A: Yes, but initial setup may take several hours. We recommend starting with your core modules (100–500K lines) on heavy, with the rest on light.

Q: Does private deployment support heavy index? A: Yes. SCIP indexing is fully local — no data leaves the customer's network.

Q: How does incremental re-indexing work? A: Light index is truly incremental (file change → seconds). Heavy index uses batch incremental (per commit batch or scheduled full rebuild).

Q: How does this compare to Sourcegraph? A: Sourcegraph has excellent code indexing and we're aligned with it on capabilities (we use the same SCIP toolchain in some scenarios). But combo agent is a full Agent platform combining LLM retrieval + team memory + business rules — not just "code search."

Full capability details: Code Index Three Tiers (docs/concepts/code-index).

Written by

Nox-Lumen Tech-team

Published

May 14, 2026