Case study: Full-branch heavy index vs zero-index

Background

A customer piloted Kotlin review on Gitee repo uu5208/simpler-robot, branch v4-dev, running two paths in parallel:

A. Heavy-index deep review: create KB (code-aware, index_tier=heavy) → import → parse → audit batch → sample semantic / navigate checks → HTML report
B. Zero-index quick pass: no KB—gitee_get_tree + gitee_get_file_contents read sources → review per code-review dimensions → JSON / MD / HTML

These are the two canonical shapes for code review on the platform—useful for architecture / procurement decisions.

Side-by-side

Dimension	A. Heavy-index deep	B. Zero-index quick
Coverage	Whole `simbot-cores/` subtree (97 files / 529 KB)	Four core `.kt` files (~31 KB)
Setup	KB create + import + parse (minutes to low tens of minutes)	Credentials + tree fetch (seconds)
Retrieval	audit / hybrid / navigate / semantic	LLM reads files only
Typical questions	“What high findings exist repo-wide?” “Who calls this symbol?”	“What’s wrong in these four files?”
Reuse	KB persists—each later review reuses the index (incremental sync)	Re-fetch every run
Artifacts	HTML dashboard + audit hit list + key call chains	JSON + Markdown + HTML
Observed findings (B run)	— (A still parsing)	17 — fatal:0 / severe:3 / general:8 / minor:6
Top risks (B)	—	① coroutine scope leak risk; ② missing safety checks in event dispatcher; ③ interface dependency direction mismatch

A goes deeper but costs setup time; B is near-instant but only inspects files you tell the LLM to open. Pick based on product needs—see “When to choose”.

When to pick A vs B

Need	Path
Pre-release whole-repo inventory	A
Onboarding / compliance physical for a new repo	A
Long-lived “review → feedback → LTM” loop	A
Spot-check a vendor drop or a handful of files	B
Huge repo or constrained network—KB cost too high	B
No PR/MR but need a fast HTML report	B (lowest cost)

A. Heavy-index — prompt template

Scenario C (full-branch). Let it run end-to-end—heavy indexing may take 10–15 minutes; have the agent poll without pausing for “continue?” prompts.

Run a full-repo code review (self-hosted KB → import → parse → L2 → HTML).

Repo: https://gitee.com/<your-org>/<your-repo>.git (branch=<your-branch>)
Token: <REDACTED — inject via manage_credential, never put in plaintext prompts>

Pipeline (do not pause mid-flight—finish all steps):

1. manage_credential store gitee token (if unset)

2. kb_create:
   - name: "<repo>-review-<timestamp>"
   - parser_id: "code-aware"
   - parser_config: {"index_tier": "heavy"}
   Capture kb_id—use downstream immediately.

3. import_from_gitee dryrun → execute back-to-back:
   - url, kb_id, mode="dryrun"
   - values={"ref":"<branch>","sub_path":"<sub-path>"}
   - file_globs=["*.kt"], limit=50
   After dry-run counts + dryrun_id, rerun same args with mode="execute", parse_intent="auto".

4. Wait for parsing (~10–15 min is normal—do not abort):
   - Poll kb_doc_status every 30s (no tight loops)
   - Done when kb_doc_list shows docs[].status all done
   - Hierarchy check: locations should include "<sub-path>/..." prefixes (field `location`, not just `name`)

5. code-review L2 (scenario C, skip gitee_get_tree—use KB only):
   a. unified_search code_search mode=audit, kb_ids=[new KB], top_k=50
      aggregate high/medium by rule_id
   b. mode=hybrid top_k=10 sample queries:
      - hard-coded password token secret credential
      - SQL concatenation injection
      - random crypto weak cipher
      - leaked resources leak close cleanup
      - thread safety concurrency data race
   c. Navigate key audit symbols → mode=navigate query="refs:<symbol>"
   d. Skip KB requirement lookup (none bound) and skip LTM

6. HTML report write_file → outputs/code_review_visual_report.html:
   - Overview: file counts, langs, kb_id + name
   - Audit table: rule_id × severity × count
   - Top 10 risks: file:line / snippet / fix / confidence
   - Navigate call-graph excerpts

Guardrails:
- Do not repeatedly kb_parse queued docs—it duplicates tasks
- Any subjective judgement must reconcile with code_search; audit findings are ground truth
- No interim “progress reports”—finish steps 1–6 then summarize
- Never mutate upstream repo

B. Zero-index — prompt template

When “I only need these files — skip KB uploads.” No kb_create / kb_upload / kb_parse / import_from_gitee; only gitee_get_*.

code-review scenario C full-branch review without KB ingestion.

Repo: https://gitee.com/<your-org>/<your-repo>
Branch: <branch>
Scope: *.kt under <sub-path>/<...>
Token: <REDACTED — manage_credential scm/gitee/tenant>

Steps:

1. gitee_get_tree(owner="<owner>", repo="<repo>", ref="<branch>", recursive=True)
   filter target .kt files, max 10

2. gitee_get_file_contents(...) per file — full text

3. Walk code-review Step-3 dimensions per file:
   - code_standard naming/comments/SRP
   - security auth/secrets swallowed exceptions
   - performance blocking IO / N+1 / concurrency hazards
   - design_consistency contracts / dependency direction

4. Skip Step 2 (KB req search) & Step 4 (LTM) — no KB bound

5. Emit JSON + Markdown + HTML under outputs/; every finding needs reasoning + confidence

Rules:
- Never call kb_create / kb_upload / kb_parse / import_from_gitee — gitee reads only
- Paste raw errors verbatim
- Finish with severity histogram + titles for top 3 findings

Ground truth (complete B-path run)

Item	Value
Files reviewed	4
Total findings	17
Severity mix	fatal:0 · severe:3 · general:8 · minor:6
Top 3	① coroutine scope leak ② dispatcher safety gaps ③ interface dependency inversion
Artifacts	JSON / Markdown / HTML

Path A imported 97 files / 529 KB; deep audit/navigate outputs land once parsing completes—the upside is index once, reuse forever.

Security reminder

Never paste PATs/password/API keys inside prompts.

credential tool: manage_credential store …
Later prompts reference purpose scopes only (scm/gitee/tenant); runtime injects secrets
Tokens stay out of LLM context and session logs

The Token: lines in the templates are placeholders—replace with “already stored via manage_credential” in real use.

Case study: Full-branch heavy index vs zero-index

On this page