Nox-Lumen Mfg

Observability

Why observability matters

In enterprise settings, AI must be not only usable but governable:

Compliance audits — every judgment needs a paper trail
Incident triage — why did the Agent decide this?
Cost governance — token spend, model calls, external API usage
Quality traceability — every step toward a conclusion is replayable

Four observability layers

Layer	What you observe	Typical tooling
Business	Sessions, skill calls, delivered artifacts	Workbench audit logs
Agent	Thoughts, tool calls, sub-agent delegation	Session execution trace
System	Model calls, tokens, latency	Prometheus + Grafana
Infrastructure	CPU / memory / network / storage	Standard cloud monitoring

Session replay

Every session can be fully replayed:

Rendering diagram…

The replay panel includes:

Timestamp for each step
Each model call (prompt + completion)
Each tool call (arguments + return value)
Token usage and cost per step

This enables minute-level incident reproduction and compliance auditing.

Ledger: disciplined step execution

Ledger is the Orchestrator’s core component—it records incrementally:

step_status tracking (pending / running / success / failed)
Confidence for each matched item
Low-confidence semantic backtracking for multi-turn validation
Discipline checks and replan fuse conditions

Agents cannot skip Ledger writes—it is write-then-act, not “log when done.”

Distributed tracing

Built on OpenTelemetry:

Rendering diagram…

Trace ID spans the full request path
Calls across agents, sub-agents, and external APIs are linked
Integrates with standard APM systems (Jaeger, Tempo, DataDog, etc.)

Cost visibility

Aggregate by:

Tenant / user / session / skill
Model / time window
Success / failure

Useful for model cost optimization and budget control.

Hooks — low-level entry point for observability hooks
Monitoring & operations
Security & compliance

Previous

Multi-tenancy model

Next

User guide

On this page

Why observability matters Four observability layers Session replay Ledger: disciplined step execution Distributed tracing Cost visibility Related docs