skill-refinement
Capabilities
When a skill’s output misses expectations, skill-refinement helps you:
- Layer diagnostics over execution history to find root cause
- Judge whether the cause is in the skill instructions (fixable by editing the skill)
- If yes, propose improvements for user review
- Persist changes only after explicit user confirmation
Core constraints (MANDATORY)
| # | Constraint | Description |
|---|---|---|
| M1 | Only touch the skill the user named | If the user asks to analyze X, read and edit X only |
| M2 | Show first — do not auto-save | After analysis, output the proposed edits → TERMINATE → wait for explicit “save” / “confirm” / “OK”, then call skills_update |
| M3 | If root cause is not in the skill, say so | If execution drift or model limits are the issue, do not force a skill rewrite |
| M4 | Do not paste raw user chatter into the skill | Distill generalized rules behind the feedback |
Two-layer analysis
Layer 1: digest (unified_search(action="get_round"))
Quickly scan episode summaries: what the Agent did each round, which tools ran, what came back.
Layer 2: detail (get_round_detail)
For suspicious rounds, pull full reasoning traces and tool I/O: arguments, raw returns, interpretations, final outputs.
Root-cause decision tree
Rendering diagram…
How to trigger
Natural language examples:
- “Analyze what went wrong with skill X”
- “Why didn’t code-review catch that memory leak this time?”
- “Improve novelty-search retrieval strategy”
Division of labor vs skill-architect
| Dimension | skill-architect | skill-refinement |
|---|---|---|
| Scenario | Greenfield skill | Iterate existing skill |
| Data | Requirement discussion | Execution history (round detail) |
| Tool | skills_create | skills_update |
| Output | New SKILL.md | Incremental diff |