Draining 657 Messages and Running Out of AI on the Way Down

Quota exhaustion is the most expensive way to prove your architecture is working.

The Synapse-L4 integration had been running in test mode for a few days — a Python sidecar emitting pre-validated anomaly events onto a Redis stream, waiting for Sentinel to pick them up and route high-confidence signals to Gemini for audit narrative generation. During testing, events came through slowly. Everything looked fine.

Then I switched to production mode, flushed the backlog, and within about four minutes I had drained 657 messages and exhausted the Gemini free-tier quota for the day. The catch-and-persist pattern meant nothing was lost — all 657 events hit the database, just most of them without AI-generated narratives. But the pipeline was effectively non-functional for the rest of the day.

This is the story of how I fixed it in about twenty minutes, why I could, and what the system looks like now.

Synapse-L4: A Different Stream, A Different Shape

The Synapse-L4 sidecar is a Python process that runs anomaly detection on telemetry data and emits validated events — Axioms — when something looks wrong. An Axiom looks like this:

{
  "status": "critical",
  "metric_value": 94.2,
  "anomaly_score": 0.97,
  "source_id": "axm_00423",
  "emitted_at": "2026-04-01T14:22:11Z"
}

These are different from raw transactions. They’ve already been processed by the Synapse scoring engine. They come with an anomaly_score that Sentinel uses for routing: above 0.80, route to AI analysis. Below 0.80, log it cheaply and move on.

I gave Axioms their own stream (synapse:axioms), their own consumer group (axiom-workers), their own worker command (sentinel:watch-axioms), and their own reclaimer (sentinel:reclaim-axioms). Separate from the transaction pipeline entirely. They have different data shapes, different processing logic, different scaling needs. Mixing them into the transaction stream would mean every consumer branching on message type — that way lies a mess.

The Cross-Language Format Mismatch

The first real problem was a test/production format mismatch.

My test helpers for the Axiom consumer were emitting messages in JSON-wrapped format — a single data field containing the whole Axiom as a JSON string, because that was how the PHP transaction publisher worked and I’d copied the pattern:

XADD synapse:axioms * data '{"status":"critical","anomaly_score":0.95,...}'

The real Python sidecar emits flat key-value fields:

XADD synapse:axioms * status critical anomaly_score 0.95 metric_value 94.2 ...

Predis returns Redis Stream messages as a flat array: ['status', 'critical', 'anomaly_score', '0.95', ...]. The consumer was trying to JSON-decode the value of a data key that didn’t exist. Tests passed (because the test helper emitted the PHP format). Production failed (because the Python sidecar emitted the correct format).

The fix: parse the flat Predis array by zipping it into key-value pairs, cast anomaly_score and metric_value to floats explicitly, and update the test helper to emit the Python format. The lesson, which I’ve now learned twice: test helpers that mock external producers must match the actual producer’s wire format, not a convenient simplified version you made up.

The 1,146 Events Problem (That Wasn’t a Problem)

After fixing the format mismatch and draining the backlog, the terminal showed 657 messages processed and the dashboard showed 25 events in the compliance table. That’s a big discrepancy. My first thought was a write bug — something silently dropping events after the first page.

It was pagination. The table showed 25 rows because per_page: 25 is the default. The meta.total in the paginator response showed 1,146 — which is right, because many messages produce multiple compliance events, and some events came from previous test runs.

Before suspecting logic bugs: check the database directly.

php artisan tinker
>>> \App\Models\ComplianceEvent::count()
1146

Everything was there. The write path was fine. I’d spent twenty minutes debugging pagination.

Gemini Quota, and Why the Driver Abstraction Exists

The real problem: 657 Axioms processed in four minutes, roughly half routed to Gemini (above the 0.80 threshold), Gemini free tier rate limit hit almost immediately. Narratives for those events: null. The catch-and-persist pattern saved the data — no events were lost, routed_to_ai was set to true, but audit_narrative was null for anything that hit the 429.

The fix was a single environment variable change:

SENTINEL_AI_DRIVER=openrouter

ComplianceManager resolves the active driver from config. Both GeminiDriver and OpenRouterDriver implement the same ComplianceDriver interface: one method, analyze(array $data): array, returning {narrative, risk_level, policy_refs, confidence}. Switching backends is invisible to every caller.

OpenRouterDriver uses OpenRouter’s OpenAI-compatible endpoint with a free-tier Llama 3.3 model as the default. Same prompt template. Same expected response schema. I swapped the driver, reprocessed the null-narrative events by replaying the stream, and filled in the gaps.

The service manager pattern for AI drivers is the kind of abstraction that looks like over-engineering until the moment you need it. That moment came on day one of production load.

The `observe` Flag

One other thing worth mentioning: MCP tool calls and test runs should not pollute dashboard metrics or the transaction history table.

TransactionProcessorService::process() takes an observe: bool parameter. When false, it skips the Redis LPUSH to the live feed, skips the Cache::increment for metric counters, and skips the Postgres write to the transactions history table. The pipeline still runs and returns a result — it just doesn’t record anything.

MCP tool calls pass observe: false. Tests pass observe: false. Dashboard numbers reflect only real production activity, not agent queries or test runs. One parameter, no surprises, no if ($env !== 'testing') hacks scattered through the codebase.

What the System Looks Like Now

Three workers, two streams, one dashboard. The transaction pipeline processes raw events from any producer. The Axiom pipeline processes pre-validated anomaly signals from Synapse-L4. Both are backed by consumer groups with automatic recovery for zombie messages. The AI driver can be swapped with an environment variable. The dashboard shows a complete, paginated audit trail with auto-refresh.

The backlog is drained. The quota is no longer exhausted. The narratives are generating.