Building a Compliance Engine That Doesn't Block the Web Server

The worst thing you can do in a web request is think.

I mean that literally. A web request should receive data, hand it off, and return. The moment you start doing real work in the request cycle — calling an AI API, running vector searches, waiting on network I/O — you’ve turned your HTTP server into a waiting room. Response times climb. Workers back up. Something somewhere starts timing out.

Sentinel L7 is a compliance monitoring system. Its job is to ingest events — transactions, anomaly signals, audit triggers — and decide whether they’re suspicious. That decision involves embedding vectors, querying a vector database, calling Gemini Flash for an audit narrative, and writing results to Postgres. None of that belongs in an HTTP request.

So from day one, I built it as three separate processes.

The Three-Process Model

The web process is thin. It serves the Inertia/React dashboard and hands off any incoming data to a Redis Stream. That’s it. No analysis, no AI calls, no blocking waits.

The worker process reads from that stream. For every event, it runs the full pipeline: embed → vector search → AI analysis → persist. It blocks, it waits, it does the slow stuff. Nobody’s waiting on it except the stream.

The reclaimer process is the safety net. It runs XAUTOCLAIM every 30 seconds and looks for messages that have been “claimed” by a worker but not acknowledged in over a minute. Those are zombie messages — the worker probably crashed mid-processing. The reclaimer picks them up and reprocesses them.

Three processes, clear responsibilities, independent scaling. It’s more moving parts than a monolith, but each part is predictable.

flowchart LR
  Event[Incoming event<br/>HTTP / sidecar] --> Web[Web process<br/>Inertia + React]
  Web -->|XADD| Stream[(Redis Stream)]
  Stream -->|XREADGROUP| Worker[Worker process<br/>embed → search → AI]
  Worker --> DB[(Postgres<br/>compliance_events)]
  Worker -->|XACK| Stream
  Reclaimer[Reclaimer process<br/>XAUTOCLAIM every 30s] -.->|recovers stuck msgs| Stream

Why Redis Streams and Not a Queue

Laravel has a perfectly good queue system. I’ve used it. But for this project I wanted the stream to be the source of truth, not just a transport layer.

Redis Streams give you cursor-based reads: each consumer tracks its own $lastId and asks for messages after that point. You can replay history. You can have multiple independent consumer groups reading the same stream for different purposes. The stream is a log, not a pipe.

The critical detail: when reading in a loop, you start from $ (new messages only) on the first call, then advance your cursor through the IDs of each batch you receive. If you forget this, you re-read the entire stream history on every iteration. I forgot this exactly once.

The `BLOCK 0` Gotcha

The Redis XREAD BLOCK 0 command means “wait forever for a new message.” On a local Redis instance this works exactly as advertised. Against Upstash’s hosted Redis — which is a remote TCP connection — it times out after about five seconds with no messages, and your worker loop dies.

The fix is boring but important: use BLOCK 2000 (a two-second timeout) and loop continuously. When the timeout fires with no messages, the command returns null, the loop continues, and you try again. It’s a poll with a generous sleep, not a true block. Good enough.

stateDiagram-v2
  [*] --> New: XADD
  New --> Pending: XREADGROUP claims
  Pending --> Processing: worker picks up
  Processing --> Acknowledged: XACK
  Processing --> Zombie: worker dies / times out
  Zombie --> Pending: reclaimer XAUTOCLAIM
  Acknowledged --> [*]

The Tier-3 Fallback

The pipeline has three tiers: vector cache hit, AI analysis on cache miss, and rule-based fallback if everything else fails. That third tier is ThreatAnalysisService — pure PHP, no network, no I/O. If the amount exceeds a threshold, it’s flagged. If not, it’s cleared. No nuance, but always a verdict.

flowchart TD
  In[Incoming event] --> Embed[Embed fingerprint]
  Embed --> Search{Vector cache hit?}
  Search -->|yes ≥ 0.90 sim| Tier1[Tier 1 — return cached verdict]
  Search -->|no| AI[Tier 2 — AI analysis + policy RAG]
  AI -->|success| Persist[Upsert cache + persist]
  AI -->|API error / quota| Tier3[Tier 3 — rule-based fallback<br/>pure PHP, no I/O]
  Embed -->|embedding API fails| Tier3
  Tier1 --> Ack[XACK]
  Persist --> Ack
  Tier3 --> Ack

I built it to fire rarely. It fires more than I expected — mostly when the embedding API is slow to respond and the retry budget runs out. The catch-and-persist pattern means no transaction is ever silently dropped: if the AI pipeline throws, the fallback runs, the result is persisted, and the stream message is acknowledged. The dashboard always has a complete picture.

Architecture Tests

One thing I added early: Pest architecture tests that enforce domain isolation. The core logic in App\Services\Sentinel\Logic is not allowed to import Http or Redis facades directly. All external I/O goes through injected interfaces.

arch('Sentinel Logic layer must not use Http or Redis facades directly')
    ->expect('App\Services\Sentinel\Logic')
    ->not->toUse(['Illuminate\Support\Facades\Http', 'Illuminate\Support\Facades\Redis']);

flowchart LR
  Console[Console commands] --> Logic
  Web[Web controllers] --> Logic
  Logic["App\Services\Sentinel\Logic<br/>(protected core)"]
  Logic -->|allowed| Ports[Injected ports<br/>ComplianceDriver, VectorStore, ...]
  Ports --> Adapters[Concrete adapters]
  Adapters --> Facades[Http / Redis facades]
  Logic -. forbidden by arch test .-> Facades

This caught two accidental facade imports during the MCP implementation phase — code that worked fine but would have been untestable in isolation. Architecture tests are cheap to write and embarrassing to fail. That’s the ideal property for a guardrail.

What This Looks Like Running

Three terminal panes. One serving the dashboard. One streaming events from the simulator (php artisan sentinel:stream --limit=100). One watching the consumer log print threat verdicts with timing annotations, noting whether each result came from the cache, the AI, or the fallback.