Nox-Lumen MfgNox-Lumen Mfg

Observability

Why observability matters

In enterprise settings, AI must be not only usable but governable:

  • Compliance audits — every judgment needs a paper trail
  • Incident triage — why did the Agent decide this?
  • Cost governance — token spend, model calls, external API usage
  • Quality traceability — every step toward a conclusion is replayable

Four observability layers

LayerWhat you observeTypical tooling
BusinessSessions, skill calls, delivered artifactsWorkbench audit logs
AgentThoughts, tool calls, sub-agent delegationSession execution trace
SystemModel calls, tokens, latencyPrometheus + Grafana
InfrastructureCPU / memory / network / storageStandard cloud monitoring

Session replay

Every session can be fully replayed:

Rendering diagram…

The replay panel includes:

  • Timestamp for each step
  • Each model call (prompt + completion)
  • Each tool call (arguments + return value)
  • Token usage and cost per step

This enables minute-level incident reproduction and compliance auditing.

Ledger: disciplined step execution

Ledger is the Orchestrator’s core component—it records incrementally:

  • step_status tracking (pending / running / success / failed)
  • Confidence for each matched item
  • Low-confidence semantic backtracking for multi-turn validation
  • Discipline checks and replan fuse conditions

Agents cannot skip Ledger writes—it is write-then-act, not “log when done.”

Distributed tracing

Built on OpenTelemetry:

Rendering diagram…
  • Trace ID spans the full request path
  • Calls across agents, sub-agents, and external APIs are linked
  • Integrates with standard APM systems (Jaeger, Tempo, DataDog, etc.)

Cost visibility

Aggregate by:

  • Tenant / user / session / skill
  • Model / time window
  • Success / failure

Useful for model cost optimization and budget control.

On this page