Memory system
Why Memory matters
Session context is short-lived, but users need to:
- Keep cross-session facts (customer preferences, past decisions)
- Attach domain knowledge bases (patent corpora, automotive standards, policies)
- Let Agents retrieve relevant background on demand
Memory tiers
| Tier | Lifecycle | Typical content |
|---|---|---|
| Session memory | Current session | Dialogue history, working variables |
| User memory | Cross-session, per user | Preferences, terminology |
| Tenant memory | Cross-user, per org | Policies, glossaries, templates |
| Knowledge base | External | Domain docs, references, historical outputs |
Retrieval
The system supports:
- Semantic search — vector similarity
- Keyword search — exact match
- Structured queries — metadata (time, author, tags)
Cross-session search
With the Graft skill, Agents can search and reference outputs across Sessions when the user authorizes it.
From KB to LTM
KB (Knowledge Base) and LTM (Long-Term Memory) serve different roles:
| KB | LTM | |
|---|---|---|
| Granularity | Coarse (whole doc / one Bug record) | Fine (single fact / preference) |
| Storage | Original text + vector index | Structured facts + provenance metadata |
| Purpose | Precise RAG on raw content | Retrieve processed patterns and insights |
| Write path | User import or system artifacts | Conversation or scheduled processors |
They are not substitutes but layers — KB is raw material; LTM is patterns distilled from that material.
Two memory flows
LTM facts arrive through two parallel streams:
Flow 1 (live): per-turn, fine-grained answers “how should the Agent talk to me”; carried by memory-sdk.
Flow 2 (KB batch): periodic batch, aggregated answers “what patterns emerged for this project / org”; carried by processor-class Skills.
Processor pattern (generic template)
Any Skill that turns structured KB rows into LTM facts follows the same shape:
| Stage | Content |
|---|---|
| Trigger | Platform LTM cron (default ~6h), not real-time |
| Incremental | Only rows without ltm_extracted_at |
| Aggregation | Multi-dimensional rollups (typically 2–4 dims) |
| confidence | Baseline + count weighting + severity cap at 0.95 |
| Provenance | source_*_ids list source records |
| Update vs insert | Compare source_*_ids |
| False-positive rollback | When source rows are invalidated, next cron recomputes |
| Capacity guard | Per-run cap + max fact length + max provenance ids |
Implemented processors
| Processor | Input (KB) | Output (LTM) | Consumer |
|---|---|---|---|
| bug-import | Historical Bug records | four-dimensional bug_pattern facts | L2 code-review |
Future processors (requirement-change, review comments, incident reports, …) can follow the same pattern—new processors are Skills; use skill-architect modeled on bug-import.
Why processing is scheduled
| Reason | Explanation |
|---|---|
| Patterns need volume | Single rows don’t show a pattern; need ≥N rows |
| Cost control | LLM + aggregation cost batches in cron windows, not blocking chat |
| Avoid half-baked facts | Real-time processing on first row yields “one-piece evidence” noise |
| Match business rhythm | Bugs / requirement batches are naturally periodic |
Related docs
- Session
- Graft
- Cron — how processors are triggered
- memory-sdk
- bug-import