Case study: Full-branch heavy index vs zero-index
Background
A customer piloted Kotlin review on Gitee repo uu5208/simpler-robot, branch v4-dev, running two paths in parallel:
- A. Heavy-index deep review: create KB (
code-aware,index_tier=heavy) → import → parse → audit batch → sample semantic / navigate checks → HTML report - B. Zero-index quick pass: no KB—
gitee_get_tree+gitee_get_file_contentsread sources → review percode-reviewdimensions → JSON / MD / HTML
These are the two canonical shapes for code review on the platform—useful for architecture / procurement decisions.
Side-by-side
| Dimension | A. Heavy-index deep | B. Zero-index quick |
|---|---|---|
| Coverage | Whole simbot-cores/ subtree (97 files / 529 KB) | Four core .kt files (~31 KB) |
| Setup | KB create + import + parse (minutes to low tens of minutes) | Credentials + tree fetch (seconds) |
| Retrieval | audit / hybrid / navigate / semantic | LLM reads files only |
| Typical questions | “What high findings exist repo-wide?” “Who calls this symbol?” | “What’s wrong in these four files?” |
| Reuse | KB persists—each later review reuses the index (incremental sync) | Re-fetch every run |
| Artifacts | HTML dashboard + audit hit list + key call chains | JSON + Markdown + HTML |
| Observed findings (B run) | — (A still parsing) | 17 — fatal:0 / severe:3 / general:8 / minor:6 |
| Top risks (B) | — | ① coroutine scope leak risk; ② missing safety checks in event dispatcher; ③ interface dependency direction mismatch |
A goes deeper but costs setup time; B is near-instant but only inspects files you tell the LLM to open. Pick based on product needs—see “When to choose”.
When to pick A vs B
| Need | Path |
|---|---|
| Pre-release whole-repo inventory | A |
| Onboarding / compliance physical for a new repo | A |
| Long-lived “review → feedback → LTM” loop | A |
| Spot-check a vendor drop or a handful of files | B |
| Huge repo or constrained network—KB cost too high | B |
| No PR/MR but need a fast HTML report | B (lowest cost) |
A. Heavy-index — prompt template
Scenario C (full-branch). Let it run end-to-end—heavy indexing may take 10–15 minutes; have the agent poll without pausing for “continue?” prompts.
B. Zero-index — prompt template
When “I only need these files — skip KB uploads.” No
kb_create / kb_upload / kb_parse / import_from_gitee; onlygitee_get_*.
Ground truth (complete B-path run)
| Item | Value |
|---|---|
| Files reviewed | 4 |
| Total findings | 17 |
| Severity mix | fatal:0 · severe:3 · general:8 · minor:6 |
| Top 3 | ① coroutine scope leak ② dispatcher safety gaps ③ interface dependency inversion |
| Artifacts | JSON / Markdown / HTML |
Path A imported 97 files / 529 KB; deep audit/navigate outputs land once parsing completes—the upside is index once, reuse forever.
Security reminder
Never paste PATs/password/API keys inside prompts.
credentialtool:manage_credential store …- Later prompts reference purpose scopes only (
scm/gitee/tenant); runtime injects secrets - Tokens stay out of LLM context and session logs
The
Token:lines in the templates are placeholders—replace with “already stored via manage_credential” in real use.