Nox-Lumen MfgNox-Lumen Mfg

Knowledge base

Knowledge bases are combo agent’s external memory. Load standards, prior projects, and industry references once; the Agent retrieves relevant chunks when answering or generating.

1. Entry point

Use the top Knowledge base tab.

KB list (My / Team + create)

Three areas:

  • Top search — by name
  • My knowledge bases — created by you
  • Team knowledge bases — shared inside the tenant

After switching tenants, the list reflects that tenant’s visibility.

2. Creating a knowledge base

Click Create knowledge base:

FieldMeaningTip
NameDisplay nameUse business meaning, e.g. “ISO 26262 pack”, “Alice patent materials”
AvatarOptional iconVisual only
DescriptionShort blurbWhat documents and audience
PermissionPrivate / tenant-sharedPrivate = only you; shared = tenant can use
LanguageChinese / English / mixedTokenization / embedding tuning

Success opens the detail page with tabs: Dataset / Retrieval test / Chunking / Settings.

KB detail tabs

3. Uploading documents (dataset)

3.1 Supported formats

.pdf   .docx   .xlsx   .txt   .md   .html   .csv   .rtf
.ppt   .pptx
.jpg .jpeg .png .gif .bmp .tiff .webp .svg   (with OCR)
.mp3 .wav   (transcription)
.eml   (email)
.dbc   (automotive CAN DB)

3.2 Upload methods

  • Drag-drop files or folders
  • Click upload
  • Batch: up to 100 files per batch
  • Folder recursion keeps paths as tags

File list with parse status

3.3 Parse states

StateMeaningTypical time
PendingUploaded, not queued
ParsingChunking + embedding~1–30s per file
ParsedRetrievable
FailedCorrupt / unsupported / OCR timeoutRetry or replace

Failed docs are not retrieved. Scheduled retry exists in prod — don’t rely on it; check status after upload.

4. Chunking (parser_id) — critical

Roughly 70% of “does the KB work” is chunking strategy. Pick parser_id under Settings — 15 options:

ParserLabelDoc typesCore idea
naiveGeneric (default)Any textFixed token chunks + optional delimiters
qaQ/A pairsFAQ / chatsDetect Q&A pairs
resumeRésumésRésumé PDF/DOCXSection-aware chunks
manualManualsUser manualsSplit by h1/h2, keep heading context
tableTablesSpreadsheet-heavyRow/cell granularity, headers kept
paperPapersAcademic PDFAbstract / intro / method / conclusion
bookBooksLong worksChapter / section tiers
lawsLegalLaws / examination guidelinesClause numbering + hierarchy
presentationSlidesPPT/PPTXOne slide per chunk + OCR figures
pictureImagesImage-heavyOne image per chunk + OCR/embed
oneWhole docVery short textsSingle chunk per document
audioAudioRecordingsTranscribe → speaker/time chunks
emailEmail.emlThread / sender splits
tagTag dictionaryGlossariesNo chunking; referenced as tags
knowledge_graphKGAny textEntity/relation extraction + graph retrieval

Chooser:

Rendering diagram…

4.0.1 Browsing chunks: ABZ ASPICE example

After upload + parse, open a file under Knowledge base → Dataset to inspect chunks.

Using the ABZ KB (Eclipse S-CORE ASPICE bundle) as example.

Step 1: file list

Columns: name, folder, chunk count, date, parser, enable toggle, parse state, actions.

ABZ file list

Step 2: open a file

  • Breadcrumb KB / Dataset / Chunks
  • Each chunk card; per-chunk enable toggle
  • Tables stay tabular (e.g. Subject / Program / Platform / Compliance)
  • Batch actions: enable / disable / delete
  • Preview controls: full text / ellipsis / search / filter

Chunk browser

Applies to all parsers (naive, qa, paper, picture, etc.) — granularity differs only.

4.1 Common chunk settings

Chunk config form

ParamRangeMeaning
Chunk tokens64–2048Max tokens per chunk; smaller = sharper retrieval
Delimitersregex/stringSplit boundaries, newline-separated
Auto keywords0–30BM25 aids per chunk
Auto questions0–10Hypothetical questions for recall
Layout recognizeON/OFFVisual layout for titles/charts; PDF/PPT: ON

4.2 Embedding model

Settings → Embedding model:

  • bge-large-zh-v1.5 (Chinese default, 1024-dim)
  • bge-m3 (multilingual, 1024-dim)
  • text-embedding-3-small / -3-large (OpenAI)
  • Self-hosted: GPUStack / Ollama / Xinference

After docs are parsed you cannot swap embedding models — vector spaces mismatch. Recreate KB if wrong. Decide before ingest.

5. Retrieval test

Retrieval test tab — validate slices before Agents use them.

Retrieval test UI

  • Enter query
  • Pick Vector / Text / Hybrid
  • Tune top_k (1–30)
  • Inspect scores + source doc + chunk IDs

Heuristics:

  • Target snippet in Top-5: chunk config OK
  • Not in Top-20: shrink chunk size, enable auto questions, or change embedding
  • Nothing found: parsing state failed or vectors missing

6. Binding to Agents (essential)

Knowledge bases do not attach to Agents automatically — bind explicitly.

6.1 Three binding modes

ModeEntryWhen
Session ephemeralChatInput 📎 uploadOne-off summaries
Combo / Agent APIkb_ids: ["kb_xxx"] in Combo payloadAlways-on corp standards
CronJobPersonal center → cron → kb_idsPeriodic scans

Agent template UI may not expose kb_ids yet — use API / cron. A graphical Agent editor checkbox is planned.

Rendering diagram…

6.2 Ephemeral vs formal KB

DimensionEphemeralFormal KB
Created byChatInput uploadKB menu
ScopeThis session onlyCross-session / team
PersistenceLost if session deletedDedicated storage
Agent seesAuto for sessionRequires kb_ids

7. Knowledge graph (parser_id=knowledge_graph)

Adds entity–relation graph:

  • Force-directed visualization
  • Graph QA (“how are A and B related?”)
  • Attribute filters (“entities where type = chip”)

KG visualization

See Skills / ecosystem for related capabilities.

8. FAQ

Q: Uploaded but answers are weak?
A: Check: (1) parsed ✅ (2) retrieval test hits top-5 (3) Agent has kb_ids (4) Plan Mode not stuck in Fast-only skips.

Q: Duplicate uploads?
A: Dedup by hash + filename. Same name overwrites slices.

Q: Disable noisy chunks?
A: Per-chunk enable toggle under Chunks tab.

Q: KB size limits?
A: Soft guidance < ~1M chunks/KB for UI comfort; ES/Infinity can scale further.

Q: Teammate deleted a KB — can I keep using it?
A: No — hard delete. Remove stale kb_ids or replace.

9. Next steps

On this page