Nox-Lumen MfgNox-Lumen Mfg

docx

Capabilities

Full read/write coverage for .docx: simple paragraphs/tables through bookmarks, cross-document links, track changes, comments, and complex styling.

AreaSupport
ReadParagraphs, tables, images, revisions, comments, styles; pandoc or unpack ingestion
AuthorBuild from scratch (docx-js) with headings, lists, tables, images, TOC, headers/footers, page sizes
EditText substitution, table edits, insert/delete paragraphs, images, find/replace
Bookmarks & linksParagraph/run bookmarks, internal anchors, cross-document hyperlinks
Revisions & comments<w:ins> / <w:del> track changes, standardized comment bodies
HighlightingKeyword hits with run splitting
Validationdocx_validator checks ZIP integrity, bookmark pairs, r:id vs. rels consistency

Two toolchains: python-docx first, XML when required

Rule of thumb: stick to python-docx until you hit a limitation; only then unpack → edit XML → repack.

Needpython-docxXML route
Paragraphs / tables / images✅ straightforward✅ supported
Styles / rich formatting⚠️ partial✅ full fidelity
Bookmarks / hyperlinks✅ via docx_link_engine
Track changes / comments✅ XML mandatory

Core modules (ragbase_skills.docx)

ModuleRole
docx_utilsParagraph walks (skip TOC), safe text edits, highlights, in-place inserts
keyword_locatorKeyword search with para_range, table fallback, disambiguation
docx_link_engineBookmarks, hyperlinks, coordinate-driven batch_operations
docx_pipelineComposition: locate+mutate in one step, automatic bidirectional linking
docx_validatorPost-edit validation
commentCanonical comment text (build_mapping_comment)

These ship pre-importable inside Agent execute_code—do not hand-author raw XML strings.

ASPICE “requirements ↔ tests” traceability and patent “spec ↔ figure list” jumps rely on cross-doc links. The skill wraps a four-step primitive:

Rendering diagram…

Pipeline helper:

from ragbase_skills.docx import docx_pipeline as pipeline
 
pipeline.bidirectional_link(
    pairs=[("req.docx", "REQ_001", "test.docx", "TEST_001")],
    output_dir="out/"
)

Cross-document troubleshooting

SymptomCauseFix
Opens home page instead of anchorUsed w:anchor (internal-only)Cross-doc must use r:id + relationship
Lands in TOC, not body_Toc bookmark noisePlace bookmarks on body paragraphs; neutralize _Toc if needed
Word navigation fails silentlyBookmark name > 40 chars (CJK sanitization)Keep names ≤ 40 chars, bm_ prefix
Field hyperlinks not clickableControl fldChar runs carry rPrKeep control runs style-free; formatting only on visible text runs

See references/hyperlinks.md inside the skill for the full checklist.

Bulk + bidirectional workflows

Real programs need hundreds of coordinated bookmarks/links. docx_link_engine supports batch flows:

Bulk bookmarks (single unzip)

from ragbase_skills.docx import docx_link_engine as dle
 
specs = [("REQ_001", "bm_REQ_001"), ("REQ_002", "bm_REQ_002"), ...]
count = dle.batch_insert_bookmarks_on_run("req.docx", specs, "req.docx")
specs = [
    # (keyword, target doc, target bookmark, display text)
    ("REQ_001", "test.docx", "bm_TC_001", "Test case TC_001"),
    ("REQ_002", "test.docx", "bm_TC_002", None),  # None keeps visible text
]
count = dle.batch_insert_cross_document_hyperlinks_on_run("req.docx", specs, "req.docx")

Split one keyword into many targets

links_created, rids = dle.split_run_with_cross_document_hyperlinks(
    "req.docx",
    text="REQ_001",
    targets=[
        ("test.docx", "bm_TC_001_01"),
        ("test.docx", "bm_TC_001_02"),
        ("test.docx", "bm_TC_001_03"),
    ],
    out="req.docx",
    nearby="functional requirements",
    para_range=(120, 200),
)

For large specs, run keyword_locator.batch_locate() once, feed coordinates into batch_operations:

from ragbase_skills.docx import keyword_locator, docx_link_engine as dle
 
locators = keyword_locator.batch_locate("req.docx", keywords=[...])
 
operations = [
    {"op": "bookmark",        "locator": locators["REQ_001"], "bookmark_name": "bm_REQ_001"},
    {"op": "hyperlink",       "locator": locators["REQ_002"], "target_doc": "test.docx", "bookmark_name": "bm_TC_002"},
    {"op": "split_hyperlink", "locator": locators["REQ_003"], "targets": [("test.docx", "bm_TC_003_01"), ("test.docx", "bm_TC_003_02")]},
]
result = dle.batch_operations("req.docx", operations, "req.docx")
# → {"bookmarks": 1, "hyperlinks": 1, "splits": 1, "failed": 0, "failed_details": []}

N-to-N bidirectional example

# Phase 1: bookmarks on both documents (single unzip per side)
for text_a, bm_a, pr_a, nb_a, text_b, bm_b, pr_b, nb_b in mapping:
    dle.insert_bookmark(local_a, text_a, bm_a, local_a, nearby=nb_a, para_range=pr_a)
    dle.insert_bookmark(local_b, text_b, bm_b, local_b, nearby=nb_b, para_range=pr_b)
 
# Phase 2: mutual cross-doc hyperlinks
for text_a, bm_a, pr_a, nb_a, text_b, bm_b, pr_b, nb_b in mapping:
    dle.insert_cross_document_hyperlink(local_a, text_b, final_b_name, bm_b, local_a, nearby=nb_a, para_range=pr_a)
    dle.insert_cross_document_hyperlink(local_b, text_a, final_a_name, bm_a, local_b, nearby=nb_b, para_range=pr_b)

Runtime footguns

Footgun 1: docx ≠ out overwriting progress

# ❌ Each loop reads pristine local_original, wiping prior bookmarks
for text, bm in pairs:
    dle.insert_bookmark(local_original, text, bm, local_output)
 
# ✅ Write in place so every iteration sees the last mutation
for text, bm, pr in pairs:
    dle.insert_bookmark(local, text, bm, local, nearby=..., para_range=pr)

Footgun 2: Mixing doc.save() with dle.insert_*() in one loop

python-docx writes the in-memory graph; dle.insert_*() reads disk XML. Alternating them fights for the file.

# ❌ Each doc.save() nukes prior dle hyperlink edits
for gac_id in func_ids:
    docx_utils.safe_append_text(para, f" {gac_id}")
    doc.save(local_path)
    dle.insert_cross_document_hyperlink_on_run(local_path, gac_id, ...)
 
# ✅ Phase A: python-docx batch, single save; Phase B: dle batch
for gac_id in func_ids:
    docx_utils.safe_append_text(para_map[gac_id], f" {gac_id}")
doc.save(local_path)
 
specs = [(gac_id, target_doc, gac_id, gac_id) for gac_id in func_ids]
dle.batch_insert_cross_document_hyperlinks_on_run(local_path, specs, local_path)

Keyword disambiguation

Operations like “replace XX with YY in §3.2.1” blow up if you str.find() everywhere—TOC duplicates and cross-chapter collisions happen constantly. keyword_locator narrows with para ranges + context words:

from ragbase_skills.docx import keyword_locator
 
result = keyword_locator.locate_keyword(
    "spec.docx",
    keyword="30",
    nearby="breach of contract",   # must appear nearby
    para_range=(120, 150),       # coarse range from ES retrieval
    match_mode="run"             # run / paragraph / fuzzy
)
# → {"found": True, "paragraph_idx": 135, "run_idx": 3, ...}

Usually pairs with document-editing: ES gives para_range, locator fine-tunes coordinates, docx_link_engine performs the mutation.

Scenario guide

ScenarioApproach
Contract/report/memo from templatedocx-js greenfield with TOC/headers
Bulk template fieldsdocx_utils.safe_replace_text
Patent specification revisionskeyword_locator + docx_utils.insert_paragraphs_after
Requirements ↔ tests trace linksdocx_pipeline.bidirectional_link
OA responses as track changesunpack → insert <w:ins> / <w:del>
Review annotationscomment.build_mapping_comment + XML comments
Cross-doc referencespipeline.locate_and_apply(doc, specs)

Invocation

/docx Generate a contract from this template with client fields filled

Natural-language triggers:

  • “Create a Word doc”
  • “Rewrite chapter 3 with Q2 metrics”
  • “Add bidirectional hyperlinks between these specs”
  • “Batch-insert review comments”
  • “Accept every tracked change”

How this skill relates to others

SkillRole
docx (this page)Low-level DOCX file APIs
document-editingCross-format search → edit → snapshot flows (calls docx internally)
xlsxSpreadsheet analog
html-reportInteractive HTML dashboards
patent-doc-formatterIndustry-specific formatter atop docx

Hard rules (breaking them corrupts packages)

#Rule
R1Inside execute_code, must import ragbase_skills.docx; never stitch raw OOXML strings by hand
R2Disambiguate with para_range + keywords—no global str.find() (TOC collisions)
R3Insert mid-document via docx_utils.insert_paragraphs_after() (doc.add_paragraph() appends eof only)
R4Temp dirs via tempfile.mkdtemp(prefix="docx_")
R5Run docx_validator.validate_docx(path) before shipping
R6Cross-doc links cannot rely on w:anchor; use r:id relationships
R7Bookmark names ≤ 40 chars, bm_ prefix, anchored on body paragraphs
R8Leading/trailing spaces inside <w:t> need xml:space="preserve"

On this page