← Back to Organized AI
March 2026 · Jordaaan Hill · 14 min read

The Observability Architecture

OTEL + Claude Code Hooks to KPI Dashboards — how we instrument, monitor, and secure every layer of the AI agent stack.

Why Observability Matters for Agent Teams

When autonomous agents make decisions, execute code, and interact with production systems, you need visibility into every layer. Not just "is it running" — but what is it doing, what is it costing, and is anyone gaming the system. Traditional APM doesn't cover this. You need purpose-built observability for AI agent infrastructure.

This paper documents the full observability architecture powering Organized AI's managed agent deployments — from Claude Code hooks that capture developer interactions, through OpenTelemetry pipelines that route telemetry, to KPI dashboards that surface actionable metrics and FrawdBot security that catches behavioral anomalies before they become incidents.

The Eight-Layer Stack

The observability architecture spans eight distinct layers, each with specific responsibilities. Data flows upward from instrumentation to storage, while security monitoring operates as a cross-cutting concern with access to every layer.

LAYER 1 — CLAUDE CODE HOOKS (Developer Workstation) PostToolUse (async) → Write|Edit|Bash → file changes + test results Stop (agent) → task completion verification + test pass rate SessionEnd → session duration, tool calls, git diff stats PreCompact → context utilization %, token burn rate Emitter: .claude/hooks/otel-emit.js → @opentelemetry/sdk-node │ │ OTLP/HTTP :4318 ▼ LAYER 2 — OTEL COLLECTOR (Mac Mini M4) Receivers → Batch Processor (5s / 100) → Exporters Fan-out to Tier 1 (infra) and Tier 2 (LLM) │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ LAYER 3 — AGENT RUNTIME OpenClaw Gateway :3000 ClawRouter (14-dimension scoring, <1ms) WhatsApp · Telegram · Discord · Web API · iMessage · Signal Ollama :11434 → SIMPLE(3B) · MEDIUM(8B) · COMPLEX(70B) · CODE(7B) diagnostics-otel → GenAI semantic convention spans │ ▼ LAYER 4 — TOKEN FLOAT CACHING Semantic Cache (Redis :6379) → exact match + vector similarity (0.92) Token Broker :5050 → provider arbitrage across 6+ providers Token Pool Manager → pre-purchased bulk pools + spot pricing Cache hits = $0 cost = pure margin → funds hardware leases │ ▼ LAYER 5 — KPI DASHBOARDS Prometheus :9090 → time-series metrics, 15s scrape Grafana Tempo :3200 → distributed traces, TraceQL Langfuse :3100 → trace explorer, prompt mgmt, eval scores PostHog → product analytics, feature flags, HogAI queries Grafana :3001 → Cost | Tokens | Routing | SLA | Cache | Float │ ▼ LAYER 6 — DATA PERSISTENCE PostgreSQL :5432 → langfuse_db · posthog_db · grafana_db · frawdbot_db · token_float_db ClickHouse :8123 → metrics · events · traces · fraud analytics · cache analytics S3 Cold Storage → parquet export · backups · data lake · forensic archive │ ▼ LAYER 7 — FRAWDBOT SECURITY (cross-cutting) Agent Fraud → rogue tools, skill tampering, loop exploitation Insider Threat → off-hours access, privilege escalation, data exfil Cost Fraud → token laundering, tier manipulation, budget bypass Prompt Security → injection detection, jailbreak, PII exfiltration Cache Fraud → cache poisoning, pool draining, rate manipulation │ ▼ LAYER 8 — CLAWHERD FLEET ORCHESTRATOR Git repo as source of truth → configs, skills, fleet definitions Pull-based agent → clawherd-agent syncs every 5 min via launchd Roles: control-plane · dev-workstation · client-node · studio-lease SOPS + age encryption → secrets in git, decrypted only on target

Layer 1: Claude Code Hooks

Every developer interaction with Claude Code generates telemetry. Four hook types capture different event classes:

All hooks feed into .claude/hooks/otel-emit.js, which converts JSON stdin into proper OpenTelemetry spans using @opentelemetry/sdk-node. Spans ship to the OTEL Collector via OTLP/HTTP on port 4318.

Layer 2: OTEL Collector

The central nervous system. Receives telemetry from both Claude Code Hooks and OpenClaw's diagnostics-otel plugin. A batch processor (5-second timeout, 100-span batches) smooths traffic before fan-out to downstream exporters.

Two tiers of export: Tier 1 sends infrastructure metrics to Prometheus and traces to Tempo. Tier 2 routes LLM-specific data to Langfuse and product analytics to PostHog. This separation keeps infrastructure SRE dashboards fast while giving ML teams their own observability plane.

Layer 3: Agent Runtime

OpenClaw Gateway on port 3000 handles inbound messages from six channels: WhatsApp, Telegram, Discord, Web API, iMessage, and Signal. ClawRouter scores every request across 14 dimensions in under 1ms, routing to the optimal model tier.

Local inference through Ollama serves four model tiers: SIMPLE (3B parameters), MEDIUM (8B), COMPLEX (70B), and CODE (7B specialized). The diagnostics-otel plugin emits GenAI semantic convention spans for every inference call — model, tokens in/out, latency, and cost.

Layer 4: Token Float Caching

The economic engine. Three caching strategies combine to eliminate redundant inference costs:

Combined hit rate target: 30-50%. Every cache hit costs $0 — pure margin.

Token Float Economics: The Token Broker on port 5050 routes requests across providers at wholesale rates (Ollama, OpenRouter, Anthropic, OpenAI, Google, Groq) while billing clients at retail. The margin from cache hits + local inference + provider arbitrage funds Mac Studio hardware leases. Phase 1: 3 units at $900-1,200/mo. Phase 3: 15 units at $4,500-6,000/mo. ROI per unit: ~10-15x lease cost in client revenue.

Layer 5: KPI Dashboards

Tier 1 — Infrastructure

Tier 2 — LLM Observability

Grafana dashboards on port 3001 surface: Cost Savings, Token Burn Rate, Routing Tier Distribution, Task Completion Rate, Latency p95, SLA Uptime, Cache Hit Rate, and Float Balance.

Layer 6: Data Persistence

StoreDatabasesPurpose
PostgreSQL :5432langfuse_db, posthog_db, grafana_db, frawdbot_db, token_float_dbTransactional data, audit trails, billing ledger
ClickHouse :8123Metrics, Events, Traces, Fraud Analytics, Cache AnalyticsHigh-cardinality OLAP, materialized P&L views
S3 Cold StorageParquet, DB Backups, Data Lake, Forensic Archive, Billing ArchiveLong-term retention, compliance, ML training data

Data lifecycle: hot (PostgreSQL, real-time) to warm (ClickHouse, analytical) to cold (S3 Parquet, archived). Daily partitioning with Snappy compression. Nightly pg_dump. Weekly ClickHouse backups to S3. Forensic archives are immutable with chain-of-custody logging.

Layer 7: FrawdBot Security

Five detection modules run continuously against all agent activity:

Ingestion spans four sources: Langfuse API traces, direct PostgreSQL reads, Prometheus PromQL baselines, and AlertManager webhooks. Response actions: kill session (Gateway API), adjust trust score (ClawRouter), push alerts (AlertManager), write to forensic storage.

Layer 8: ClawHerd Fleet Orchestrator

Fleet configuration management for distributed Mac hardware. A Git repository serves as the single source of truth — configs, skill manifests, fleet definitions, and encrypted secrets all version-controlled.

Every Mac runs clawherd-agent, a pull-based daemon syncing every 5 minutes via launchd. The agent resolves its role from inventory.yaml, diffs desired state against actual state, and applies only what changed. Idempotent. No push. No SSH.

Roles

Skills Distribution

A registry.yaml marketplace catalog defines skill bundles: core (free), marketing, sales, product, data, gtm, dev. Per-client resolution combines tier minimums with vertical selection and addon toggles. Skills sync via rsync to ~/.openclaw/skills/ with hot-reload on the Gateway. Semver versioning with canary rollouts and automatic rollback on health check failure.

Secrets Management

SOPS + age encryption at rest in Git. The age private key lives in macOS Keychain. Decrypted only on the target Mac at apply time. Never logged. Never in OTEL spans.

Port Map

PortServiceLayer
:3000OpenClaw GatewayAgent Runtime
:3001Grafana DashboardsVisualization
:3100LangfuseLLM Observability
:3200Grafana TempoTrace Storage
:4000FrawdBot EngineSecurity
:4001FrawdBot DashboardSecurity UI
:4317OTEL Collector gRPCInternal
:4318OTEL Collector HTTPPrimary Ingress
:5050Token BrokerProvider Arbitrage
:5432PostgreSQLData Persistence
:6379RedisSemantic Cache
:8000Coolify DashboardDeployment PaaS
:8123ClickHouseOLAP Analytics
:9090PrometheusMetrics Storage
:9093AlertManagerAlert Routing
:11434OllamaLocal Inference

Metric Namespaces

Seven metric namespaces cover the full stack:

Deployment: The entire stack deploys via Coolify (:8000), a self-hosted PaaS managing Docker containers. Stack templates cover AI (OpenClaw+Ollama+Router), Cache (Redis+Broker), Observability (Prometheus+Grafana+Tempo+OTEL), Data (PostgreSQL+ClickHouse+Langfuse), and Security (FrawdBot+AlertManager). Caddy reverse proxy handles SSL termination with auto-routing. Git-push deploys. One-click client provisioning.

Continue Reading

Enter your email to unlock the full observability architecture — stack diagrams, port maps, metric namespaces, token economics, and fleet orchestration.

We'll send you updates on Organized AI infrastructure. No spam. Unsubscribe anytime.