Nox-Lumen MfgNox-Lumen Mfg

Monitoring and operations

Three monitoring pillars

Rendering diagram…

System monitoring

KPIs

MetricSuggested targetNotes
CPU utilization<70% sustained>80% → scale out
RAM utilization<75%Primary OOM risk signal
Disk I/O utilization<70%Hot on ES / MinIO tiers
Disk capacity<80%Alert immediately above
Network bandwidth<70%Heavy east/west on ES & object store

Integrations

  • Prometheus + Grafana — dashboards ship with the product
  • Zabbix — common in mainland enterprises
  • Domestic suites — Huawei and other AIOps platforms

Endpoints:

/metrics       # Prometheus scrape
/v1/health     # Liveness
/v1/ready      # Readiness (Kubernetes)

Application monitoring

Agent runtime metrics

MetricMeaningWatch for
session.activeConcurrent sessionsLoad pressure
agent.step.durationStep latency (p50/p95/p99)User-perceived slowness
tool.call.success_rateTool outcomesUpstream system health
compaction.triggeredContext compaction eventsOverlong sessions
hook.execution.countHooks firedMisconfigured guardrails

Error budgets

MetricAlert when
API 5xx rate>1% for 5 minutes
LLM failure rate>5% for 10 minutes
Per-tool failure rate>10% for 15 minutes
Session timeouts>3%

LLM cost monitoring

LLM spend is the dominant variable; a built-in dashboard tracks:

SignalGranularity
Token consumptionTenant / user / session / model / time window
Estimated cost (CNY)Model rate cards
LatencyModel + call type
Cache hit ratePrompt cache effectiveness

Savings levers

SymptomAction
High per-session tokensReview compaction; split huge documents
Tenant spikeCheck for runaway jobs / apply quotas
Elevated latencyConsider faster-tier models
Low cache utilizationTemplate prompts / reuse stable prefixes

Budget alerts

Configure daily / monthly thresholds:

  • Soft — notify admins, processing continues
  • Hard — auto-downgrade cheaper models or pause usage

Logging

Default verbosity

SubsystemLevel
API backendsINFO
Agent runtimeINFO
Hooks / toolsINFO
AuditAlways INFO

Shipping logs

  • Container logs → stdout/stderr
  • Forward with Filebeat / Fluentd / Vector → ELK / Loki
  • Audit feeds use separate pipelines from app logs

Sample alert rules

- name: high-error-rate
  expr: rate(http_5xx[5m]) > 0.01
  severity: critical
 
- name: llm-call-failure
  expr: rate(llm_error[10m]) > 0.05
  severity: warning
 
- name: disk-pressure
  expr: disk_used_percent > 85
  severity: critical
 
- name: tenant-token-spike
  expr: sum by (tenant) (rate(llm_tokens[1h])) > 1000000
  severity: warning

Channels: Feishu / WeCom / email / SMS / webhook.

Performance tuning

Horizontal scaling

TierTechniqueTypical trigger
APIMore replicas behind Nginx/HAProxyCPU >70%
Agent workersAdd worker poolsQueue depth thresholds
SearchShard / add nodesp95 queries >500 ms
Object storageAdd nodesCapacity >70%

Vertical scaling

Prefer bumping RAM/CPU on DB + ES nodes early—often cheaper until cluster limits bite.

LLM acceleration

  • Higher-concurrency vendors
  • On-prem inference with vLLM / TensorRT-LLM
  • SSE streaming improves perceived responsiveness

Backup & disaster recovery

Backup cadence

AssetFrequencyMechanism
RDBMSDaily full + 5 min WALpg_basebackup / equivalents
SearchWeekly snapshotES snapshot to MinIO/S3-compatible
Object storeNear real-timeMinIO replication
ConfigEach changeGit-tracked infra-as-code

Targets

RTORPO
Mission-critical ≤2 h≤15 min data loss budget
Non-critical ≤24 h≤24 h

Topology options

  • Metro active/active — sub-2 min failover within region
  • Geo async — ~1 h RPO cross-region DR

Automation shipped with the product

Scripts cover:

  • One-click upgrade / rollback
  • Daily health reports
  • Archival / retention jobs
  • Performance sampling & flamegraphs

On this page