What Comes Next: Multi-Tenancy, Compliance Exports, and Closing the Loop to EventHorizon

A compliance system that can’t be audited is just an expensive alerting system.

That’s been the north star for this project: not just detecting suspicious activity, but producing artefacts that a compliance officer could actually use — narratives grounded in policy documents, a complete audit trail in Postgres, cross-references back to the originating events. The pipeline exists. The artefacts are there. There’s still work to do to make them useful to people other than me sitting in a terminal.

This is where the project stands, what I think comes next, and what I’ve actually learned.

What’s Half-Built

Multi-Tenancy

The placeholder for this is already in routes/web.php — a comment noting where tenant-scoped middleware should wrap the auth group. The shape is clear: every stream key gets a tenant prefix ({tenant}:sentinel:transactions, {tenant}:synapse:axioms), every dashboard query gets a tenant filter on the Postgres side, and the middleware resolves the tenant from the authenticated user’s organisation.

This is a clean architectural extension to what already exists. The consumer commands take a configurable stream key. The models have a tenant_id column. The missing piece is the resolution layer: how does a request know which tenant it belongs to, and how does that propagate to the stream consumer? Subdomain routing is the most natural answer in a Laravel context — tenant.sentinel.app resolves to the tenant record, which provides the stream key prefix.

I haven’t needed this yet. When there’s a second user, I’ll need it immediately.

Compliance Report Export

The audit trail in compliance_events is exactly what you’d want to export: timestamp, source ID, risk level, narrative, policy references, driver used, whether it was AI-routed. A CSV export endpoint filtered by date range and optional risk level would turn the database into a report.

This is straightforward to build and genuinely useful. A compliance officer should be able to pull a quarterly report without needing database access. The data is there; the endpoint isn’t.

EventHorizon Deep-Link

Every compliance_event row has a source_id column that references the originating event in EventHorizon — the upstream event store that Synapse-L4 taps into. Right now, that column is populated but inert. It’s a string. It doesn’t link anywhere.

The deep-link would be a URL template that takes source_id and constructs a link to the EventHorizon event detail view. From the compliance dashboard, you’d click a row and land directly on the raw event that generated the anomaly. That’s the audit chain: from compliance narrative back to raw telemetry, traceable at every step.

The implementation is simple — a config value for the EventHorizon base URL, a computed property on the model, a link in the dashboard row. The design question is whether EventHorizon is a separate service or a separate domain in the same application. That’s still open.

Decisions Still Open

The 0.95 similarity threshold. ADR-0015 documents this as an open question. The threshold is deliberately strict — a false cache hit on compliance verdicts is worse than a miss — but empirical testing at 0.90 hasn’t happened yet. My expectation is that common transaction patterns (coffee shop, grocery, gas station) cluster tightly enough that 0.90 is safe for those categories, while edge cases near reporting thresholds need 0.95 or stricter. A per-category threshold is possible but adds complexity. The honest answer is I need more production data before I know where to land.

Amount representation in fingerprints. ADR-0002 is still open. The current buckets (micro/small/medium/large/very_large) work, but the boundaries are fixed. A $9.99 transaction and a $10.01 transaction land in different buckets (micro vs small) despite being nearly identical in risk profile. Softer bucket boundaries — or a continuous representation — might improve hit rate near the edges. I don’t have the data to know whether this matters in practice yet.

Prompt versioning discipline. All LLM prompt templates live in prompts/ as versioned Markdown files. This has been more valuable than I expected: when narratives started looking off after a driver switch, I could diff the prompt files and immediately see whether anything had changed. The versioning overhead is low (increment a number, add a changelog line) and the diagnostic value is high. I’d call this a clear success and would do it from day one on any future LLM project.

What I’ve Actually Learned

Semantic caching is a fingerprint design problem, not a similarity math problem. The vector operations are easy. Deciding which features of an event carry meaningful signal for similarity is hard and requires domain knowledge. Time-of-day buckets work because compliance analysis cares about when in the day, not the exact minute. Amount tiers work because compliance thresholds are categorical, not continuous. Get the fingerprint wrong and the math can’t save you.

Silent failures in multi-stage pipelines are the hardest class of bug. Every stage that can return empty needs to log whether it returned empty. Zero RAG results is not an error; it’s a signal. Zero cache hits is not an error; it’s a signal. If you only log exceptions, you’ll never see the cases where a stage succeeds at returning nothing and all downstream stages gracefully handle the void.

At-least-once delivery with Redis Streams is the right trade-off for this domain. You’d rather process an event twice (reclaimer picks up a zombie) than never process it. Idempotency on the write side (upsert, not insert) means duplicates are harmless. The reclaimer has fired a handful of times in real operation — always because a worker hit a quota error and exited mid-batch. The messages were recovered within 60 seconds.

The tier-3 fallback earns its existence. I expected the rule-based ThreatAnalysisService to be a rarely-used last resort. It fires on every Gemini rate-limit event, every network timeout, every embedding API blip. In a system that can’t drop events, the fallback isn’t a hedge — it’s load-bearing infrastructure. Design it to be correct, not just present.

Driver abstraction isn’t over-engineering when the domain requires it. The ComplianceDriver interface has one method. It took thirty minutes to write. It saved four hours when quota exhaustion required a same-day backend swap in production. The abstraction boundary existed for exactly the reason you’d hope.

Where This Goes

The pipeline is solid. The audit trail is complete. The missing pieces are UX and integration: exports for compliance officers, deep-links to source events, multi-tenancy for real deployments.

The interesting future work is on the intelligence side. The current system treats each event independently. A more capable version would correlate events over time — identifying patterns across the audit trail, surfacing recurring sources, generating periodic summary reports rather than only per-event narratives. That’s a different problem: not stream processing but longitudinal analysis. It would require a different kind of query and a different kind of prompt.

I’ll get there. For now, the dashboard is refreshing, the stream is flowing, and the narratives are grounded in actual policy text.

That’s enough.