The Observability Architecture

The Eight-Layer Stack

The observability architecture spans eight distinct layers, each with specific responsibilities. Data flows upward from instrumentation to storage, while security monitoring operates as a cross-cutting concern with access to every layer.

LAYER 1 — CLAUDE CODE HOOKS (Developer Workstation) PostToolUse (async) → Write|Edit|Bash → file changes + test results Stop (agent) → task completion verification + test pass rate SessionEnd → session duration, tool calls, git diff stats PreCompact → context utilization %, token burn rate Emitter: .claude/hooks/otel-emit.js → @opentelemetry/sdk-node │ │ OTLP/HTTP :4318 ▼ LAYER 2 — OTEL COLLECTOR (Mac Mini M4) Receivers → Batch Processor (5s / 100) → Exporters Fan-out to Tier 1 (infra) and Tier 2 (LLM) │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ LAYER 3 — AGENT RUNTIME OpenClaw Gateway :3000 ClawRouter (14-dimension scoring, <1ms) WhatsApp · Telegram · Discord · Web API · iMessage · Signal Ollama :11434 → SIMPLE(3B) · MEDIUM(8B) · COMPLEX(70B) · CODE(7B) diagnostics-otel → GenAI semantic convention spans │ ▼ LAYER 4 — TOKEN FLOAT CACHING Semantic Cache (Redis :6379) → exact match + vector similarity (0.92) Token Broker :5050 → provider arbitrage across 6+ providers Token Pool Manager → pre-purchased bulk pools + spot pricing Cache hits = $0 cost = pure margin → funds hardware leases │ ▼ LAYER 5 — KPI DASHBOARDS Prometheus :9090 → time-series metrics, 15s scrape Grafana Tempo :3200 → distributed traces, TraceQL Langfuse :3100 → trace explorer, prompt mgmt, eval scores PostHog → product analytics, feature flags, HogAI queries Grafana :3001 → Cost | Tokens | Routing | SLA | Cache | Float │ ▼ LAYER 6 — DATA PERSISTENCE PostgreSQL :5432 → langfuse_db · posthog_db · grafana_db · frawdbot_db · token_float_db ClickHouse :8123 → metrics · events · traces · fraud analytics · cache analytics S3 Cold Storage → parquet export · backups · data lake · forensic archive │ ▼ LAYER 7 — FRAWDBOT SECURITY (cross-cutting) Agent Fraud → rogue tools, skill tampering, loop exploitation Insider Threat → off-hours access, privilege escalation, data exfil Cost Fraud → token laundering, tier manipulation, budget bypass Prompt Security → injection detection, jailbreak, PII exfiltration Cache Fraud → cache poisoning, pool draining, rate manipulation │ ▼ LAYER 8 — CLAWHERD FLEET ORCHESTRATOR Git repo as source of truth → configs, skills, fleet definitions Pull-based agent → clawherd-agent syncs every 5 min via launchd Roles: control-plane · dev-workstation · client-node · studio-lease SOPS + age encryption → secrets in git, decrypted only on target

Layer 1: Claude Code Hooks

Every developer interaction with Claude Code generates telemetry. Four hook types capture different event classes:

PostToolUse — fires asynchronously on Write, Edit, and Bash operations. Captures file changes and test results without blocking the developer workflow.
Stop — agent-type hook that verifies task completion. Checks test pass rate before allowing the session to end.
SessionEnd — logs session duration, total tool calls, and git diff statistics for productivity tracking.
PreCompact — captures context window utilization percentage and token burn rate before context compaction occurs.

All hooks feed into .claude/hooks/otel-emit.js, which converts JSON stdin into proper OpenTelemetry spans using @opentelemetry/sdk-node. Spans ship to the OTEL Collector via OTLP/HTTP on port 4318.

Layer 2: OTEL Collector

The central nervous system. Receives telemetry from both Claude Code Hooks and OpenClaw's diagnostics-otel plugin. A batch processor (5-second timeout, 100-span batches) smooths traffic before fan-out to downstream exporters.

Two tiers of export: Tier 1 sends infrastructure metrics to Prometheus and traces to Tempo. Tier 2 routes LLM-specific data to Langfuse and product analytics to PostHog. This separation keeps infrastructure SRE dashboards fast while giving ML teams their own observability plane.

Layer 3: Agent Runtime

OpenClaw Gateway on port 3000 handles inbound messages from six channels: WhatsApp, Telegram, Discord, Web API, iMessage, and Signal. ClawRouter scores every request across 14 dimensions in under 1ms, routing to the optimal model tier.

Local inference through Ollama serves four model tiers: SIMPLE (3B parameters), MEDIUM (8B), COMPLEX (70B), and CODE (7B specialized). The diagnostics-otel plugin emits GenAI semantic convention spans for every inference call — model, tokens in/out, latency, and cost.

Layer 4: Token Float Caching

The economic engine. Three caching strategies combine to eliminate redundant inference costs:

Exact Match — hash lookup in under 1ms. Identical prompts return cached responses.
Semantic Cache — vector similarity using all-MiniLM-L6-v2 embeddings at 0.92 threshold. Similar-enough prompts hit cache.
KV Cache (LMCache) — reuses computed key-value pairs for shared prefix optimization in local models.

Combined hit rate target: 30-50%. Every cache hit costs $0 — pure margin.

Token Float Economics: The Token Broker on port 5050 routes requests across providers at wholesale rates (Ollama, OpenRouter, Anthropic, OpenAI, Google, Groq) while billing clients at retail. The margin from cache hits + local inference + provider arbitrage funds Mac Studio hardware leases. Phase 1: 3 units at $900-1,200/mo. Phase 3: 15 units at $4,500-6,000/mo. ROI per unit: ~10-15x lease cost in client revenue.

Layer 5: KPI Dashboards

Tier 1 — Infrastructure

Prometheus (:9090) — time-series metrics with 15-second scrape interval
Grafana Tempo (:3200) — distributed trace storage with TraceQL queries
AlertManager (:9093) — alert routing, deduplication, webhook receivers

Tier 2 — LLM Observability

Langfuse (:3100) — trace explorer, prompt version management, evaluation scores, cost analytics per trace
PostHog — product analytics, feature flags, HogAI natural language queries over event data

Grafana dashboards on port 3001 surface: Cost Savings, Token Burn Rate, Routing Tier Distribution, Task Completion Rate, Latency p95, SLA Uptime, Cache Hit Rate, and Float Balance.

Layer 6: Data Persistence

Store	Databases	Purpose
PostgreSQL :5432	langfuse_db, posthog_db, grafana_db, frawdbot_db, token_float_db	Transactional data, audit trails, billing ledger
ClickHouse :8123	Metrics, Events, Traces, Fraud Analytics, Cache Analytics	High-cardinality OLAP, materialized P&L views
S3 Cold Storage	Parquet, DB Backups, Data Lake, Forensic Archive, Billing Archive	Long-term retention, compliance, ML training data

Data lifecycle: hot (PostgreSQL, real-time) to warm (ClickHouse, analytical) to cold (S3 Parquet, archived). Daily partitioning with Snappy compression. Nightly pg_dump. Weekly ClickHouse backups to S3. Forensic archives are immutable with chain-of-custody logging.

Layer 7: FrawdBot Security

Five detection modules run continuously against all agent activity:

Agent Fraud — rogue tool usage, skill file tampering, loop exploitation, token abuse patterns
Insider Threat — off-hours access, privilege escalation, data exfiltration, config tampering
Cost Fraud — token laundering, tier manipulation, routing bypass, budget circumvention
Prompt Security — injection detection, jailbreak attempts, PII exfiltration signals
Cache/Broker Fraud — cache poisoning, pool draining, rate manipulation, billing fraud

Ingestion spans four sources: Langfuse API traces, direct PostgreSQL reads, Prometheus PromQL baselines, and AlertManager webhooks. Response actions: kill session (Gateway API), adjust trust score (ClawRouter), push alerts (AlertManager), write to forensic storage.

Layer 8: ClawHerd Fleet Orchestrator

Fleet configuration management for distributed Mac hardware. A Git repository serves as the single source of truth — configs, skill manifests, fleet definitions, and encrypted secrets all version-controlled.

Every Mac runs clawherd-agent, a pull-based daemon syncing every 5 minutes via launchd. The agent resolves its role from inventory.yaml, diffs desired state against actual state, and applies only what changed. Idempotent. No push. No SSH.

Roles

control-plane — full stack: all services, all skills, all models (Mac Mini M4)
dev-workstation — hooks + OTEL only (MacBook M1 Pro)
client-node — OpenClaw + Ollama + per-client skill profile
studio-lease — inherits control-plane + token float + remote metrics shipping

Skills Distribution

A registry.yaml marketplace catalog defines skill bundles: core (free), marketing, sales, product, data, gtm, dev. Per-client resolution combines tier minimums with vertical selection and addon toggles. Skills sync via rsync to ~/.openclaw/skills/ with hot-reload on the Gateway. Semver versioning with canary rollouts and automatic rollback on health check failure.

Secrets Management

SOPS + age encryption at rest in Git. The age private key lives in macOS Keychain. Decrypted only on the target Mac at apply time. Never logged. Never in OTEL spans.

Port Map

Port	Service	Layer
:3000	OpenClaw Gateway	Agent Runtime
:3001	Grafana Dashboards	Visualization
:3100	Langfuse	LLM Observability
:3200	Grafana Tempo	Trace Storage
:4000	FrawdBot Engine	Security
:4001	FrawdBot Dashboard	Security UI
:4317	OTEL Collector gRPC	Internal
:4318	OTEL Collector HTTP	Primary Ingress
:5050	Token Broker	Provider Arbitrage
:5432	PostgreSQL	Data Persistence
:6379	Redis	Semantic Cache
:8000	Coolify Dashboard	Deployment PaaS
:8123	ClickHouse	OLAP Analytics
:9090	Prometheus	Metrics Storage
:9093	AlertManager	Alert Routing
:11434	Ollama	Local Inference

Metric Namespaces

Seven metric namespaces cover the full stack:

openclaw.* — tokens, cost.usd, run.duration_ms, context.tokens, message.processed
claude.hooks.* — events_total, tasks_done, test_pass_rate, files_modified, session_dur_s, context_util
clawrouter.* — decisions (by tier/model), score_ms, fallbacks
frawdbot.* — threats_detected, agent_trust_score, quarantine_actions, prompt_injections, exfil_attempts
cache.* — exact_hits, semantic_hits, misses, hit_rate, savings_usd, latency_saved_ms
broker.* — requests_routed, cost_per_request, retail_rate, wholesale_rate, margin_per_request, pool_remaining
float.* — monthly_gross, lease_coverage, cache_contribution, local_contribution, arbitrage_contribution

Deployment: The entire stack deploys via Coolify (:8000), a self-hosted PaaS managing Docker containers. Stack templates cover AI (OpenClaw+Ollama+Router), Cache (Redis+Broker), Observability (Prometheus+Grafana+Tempo+OTEL), Data (PostgreSQL+ClickHouse+Langfuse), and Security (FrawdBot+AlertManager). Caddy reverse proxy handles SSL termination with auto-routing. Git-push deploys. One-click client provisioning.

The Observability Architecture

Why Observability Matters for Agent Teams

The Eight-Layer Stack

Layer 1: Claude Code Hooks

Layer 2: OTEL Collector

Layer 3: Agent Runtime

Layer 4: Token Float Caching

Layer 5: KPI Dashboards

Tier 1 — Infrastructure

Tier 2 — LLM Observability

Layer 6: Data Persistence

Layer 7: FrawdBot Security

Layer 8: ClawHerd Fleet Orchestrator

Roles

Skills Distribution

Secrets Management

Port Map

Metric Namespaces

Continue Reading