Nox-Lumen MfgNox-Lumen Mfg

parser-sdk

One-liner

ragbase-parser-sdk ships a Parser SKILL in hours—CLI scaffolding, contract validation, local dry-runs, packaging. You focus on chunking semantics, nothing else.

Install

Email for access

Packages are not on public PyPI. Email info@nox-lumen.com explaining your parser scenario (formats / data domain). We reply with private PyPI creds + templates.

Then:

pip install ragbase-parser-sdk

Minimal end-to-end

# 1) Scaffold
ragbase-cli init parser my-parser
cd my-parser
 
# 2) Inspect the default template (already runnable)
ls
# pyproject.toml  manifest.json  src/my_parser/main.py  examples/sample.txt
 
# 3) Dry-run locally
ragbase-cli parse-test --input examples/sample.txt --output ./out
cat ./out/chunks.jsonl   # chunks
cat ./out/result.json    # metadata
 
# 4) Implement
vi src/my_parser/main.py
 
# 5) Validate + bundle
ragbase-cli skill validate
ragbase-cli skill build      # emits .ragskill
 
# 6) Push
ragbase-cli skill push

Your code surface: implement parse

from ragbase.parser_sdk import Parser, ParseContext, ParseResult, Chunk
 
class MyParser(Parser):
    def parse(self, ctx: ParseContext) -> ParseResult:
        text = ctx.read_text()
        chunks = []
        for i, section in enumerate(self.split_by_my_logic(text)):
            chunks.append(Chunk(
                content=section.body,
                metadata={
                    "section_no": i,
                    "tags": section.tags,
                },
            ))
        return ParseResult(chunks=chunks)
 
if __name__ == "__main__":
    from ragbase.parser_sdk import make_cli
    make_cli(MyParser).run()

make_cli wraps your subclass with CLI entrypoints (parse / partial-parse / info) honoring the contract the platform expects.

Incremental parse (avoid re-chunking the whole repo)

Add a decorator:

from ragbase.parser_sdk.decorators import incremental_update
 
class MyParser(Parser):
    @incremental_update
    def parse(self, ctx: ParseContext) -> ParseResult:
        ...

Runtime invokes partial-parse, touching only deltas.

manifest.json — advertise capabilities

{
  "name": "my-parser",
  "version": "1.0.0",
  "kind": "parser",
  "capabilities": {
    "extensions": [".gcode", ".gco"],
    "mime_types": ["application/x-gcode"],
    "incremental": true,
    "vlm_required": false
  },
  "entry": "python -m my_parser"
}

Startup routes files by capabilities.extensions to your parser without platform code changes.

Contract outputs

FilePayload
chunks.jsonlOne JSON chunk per line → ES ingestion
result.jsonMetadata / stats / errors

Key Chunk fields (subset):

FieldRole
contentRequired chunk text
metadata.tagsBusiness filters for retrieval
metadata.outline_pathStructural path (e.g., claims.1.dependent)
metadata.source_refProvenance offsets / lines

Code-oriented parsers may also populate function_decls / class_decls / references / imports (M5). See Parser skills · Code-aware parsers.

Debug playbook

  • Fast loop: edit → parse-test → inspect chunks.jsonl
  • Contract drift: skill validate mirrors the A-layer protocol
  • Tracing: ragbase-cli parse-test --debug keeps intermediates

Real-world patterns

General doc examples

File typeParser idea
G-code (manufacturing)Chunk per process segment toggling G91 ↔ G90; metadata captures cycle time estimates, tool#, spindle RPM
AUTOSAR ARXML subset (auto)Chunk per SWC / port / interface; keep trace attrs for reqs
Corporate Word templatesChunk by Heading styles H1/H2; outline_path anchors citations

Code: bespoke slicer

Built-in code-aware parsers default to function/class atomic nodes. Customize when you need:

  • Semantic segments: docblock + signature + body = chunk
  • Commit hunks: use git blame to bundle same-commit edits
  • Call graph locality: tightly coupled symbols share a chunk
  • Business namespaces: carve monorepos by folder / package prefix

parser-sdk builds a sibling SKILL that competes peacefully with stock code-aware.

📥 Runnable reference download

Fully worked sample that chunks by git hunk while honoring chunk contracts:

hunk_aware_code_parser.py (save-as)

Follow the header comment (~5 steps) to skill push.

Core excerpt:

from ragbase.parser_sdk import Parser, ParseContext, ParseResult, Chunk
from tree_sitter_languages import get_parser
 
class HunkAwareCodeParser(Parser):
    """Chunk by git hunk — closer to PR review than per-function slicing."""
 
    def parse(self, ctx: ParseContext) -> ParseResult:
        source = ctx.read_text()
        hunks = self.git_blame_hunks(ctx.path)   # your hunk segmentation
        tree = get_parser(ctx.language).parse(source.encode())
 
        chunks = []
        for hunk in hunks:
            # Pull symbols overlapping the hunk into the same chunk
            symbols = self.symbols_in_range(tree, hunk.start_byte, hunk.end_byte)
            chunks.append(Chunk(
                content=source[hunk.start_byte:hunk.end_byte],
                metadata={
                    "outline_path": ".".join(symbols[-1].scope_chain) if symbols else "",
                    "function_decls": [s.to_dict() for s in symbols if s.kind == "function"],
                    "class_decls": [s.to_dict() for s in symbols if s.kind == "class"],
                    "commit_sha": hunk.commit,
                    "hunk_author": hunk.author,
                    "tags": [f"author:{hunk.author}", f"commit:{hunk.commit[:8]}"],
                },
            ))
        return ParseResult(chunks=chunks)

Declare extensions + precedence in manifest:

{
  "name": "hunk-aware-code-parser",
  "version": "1.0.0",
  "kind": "parser",
  "capabilities": {
    "extensions": [".py", ".java", ".go", ".rs"],
    "priority": 50
  },
  "entry": "python -m hunk_aware_code_parser"
}

Relationship to built-in code-aware — Each KB selects one active code parser via KB settings. Multiple SKILL parsers can coexist; e.g., default repos use code-aware, PR-review KB swaps to hunk-aware-code-parser.

Do we lose code fields? No—populate outline_path, function_decls, class_decls, references, imports exactly as before—three-tier code index still applies—you only redefine chunk boundaries.

For broader customization (languages, forks of built-ins) see Parser skills · Code is also a doc.

Upgrade notes (0.x → 1.0)

Namespaces moved from ragbase_parser_sdk to ragbase.parser_sdk:

sed -i 's/ragbase_parser_sdk/ragbase.parser_sdk/g' your_skill_dir/

See also

On this page