Graduated Backpressure: Moving Beyond Binary Pauses

Published: 2026-05-26 Reading time: ~7 minutes Tags: sentinel-l7, Redis Streams, backpressure, distributed systems Series: Part 2 of Sentinel-L7 Systems Patterns · Prev: Post #06 — Self-Healing Worker Pools · Next: Post #08 — Quality Scoring

The first version of Sentinel-L7’s producer guard had a threshold and a pause. When the stream depth exceeded the threshold, the producer stopped. When it dropped below, the producer ran full speed again. Binary: on or off.

This works. It also oscillates. The producer fills the stream to the limit, pauses completely, the consumer drains it, the producer fires back at full speed — and the cycle repeats. Consumer lag doesn’t smooth out; it bounces between a high-water mark and near-zero on a predictable sawtooth pattern.

This post is about replacing that on/off guard with something that actually moderates throughput.

Why XLEN Was the Wrong Signal

The original guard used XLEN — the number of messages in the stream that haven’t been read yet. The reasoning was simple: if the stream is deep, the consumer is behind; slow down.

The problem is when XLEN drops. It drops as soon as a worker reads a message, not when it finishes processing it. A Gemini call takes 4–8 seconds. During those 8 seconds, XLEN has already decremented — the stream looks fine — but the consumer is in the middle of expensive AI work and has no capacity for more.

The correct signal is XPENDING: the count of messages that have been delivered to a consumer but not yet acknowledged. That’s the actual work in flight. A high XPENDING count means the consumer has taken on more than it has finished. XLEN measures the queue; XPENDING measures the workload.

How the Lag Signal Works

After every readGroup + ack cycle, WatchTransactions writes the current XPENDING count to a Redis key:

$stream->writeLagKey($stream->pendingCount());

pendingCount() calls XPENDING in its summary form — XPENDING <stream> <group> — which returns the total count in one round-trip without enumerating individual messages. writeLagKey() stores the result with a 10-second TTL:

public function writeLagKey(int $count): void
{
    LRedis::set('sentinel:consumer_lag', $count, 'EX', 10);
}

The TTL is the important part. If the worker stops writing — because it crashed, restarted, or is between messages — the key expires. The producer reads an expired key as zero and publishes freely. This is intentional: a dead worker is already a problem; the XAUTOCLAIM recovery path handles stranded messages. The producer should not also pause in that scenario, because a paused producer with a dead consumer just accumulates messages in the stream rather than in the PEL — a different kind of stuck, but not worse.

Two Thresholds, Two Responses

The producer reads the lag key before each XADD:

$lag = $stream->readLagKey();

if ($lag > $lagPause) {
    $this->warn("Consumer lag {$lag} above hard limit {$lagPause}, pausing until drained");
    while ($stream->readLagKey() > $lagPause) {
        usleep($lagPollMs * 1000);
        if ($this->shouldStop) {
            break 2;
        }
    }
} elseif ($lag > $lagWarn) {
    $this->warn("Consumer lag {$lag} above soft limit {$lagWarn}, sleeping {$lagWarnSleep}ms");
    usleep($lagWarnSleep * 1000);
}

Soft limit (lag_warn, default 50): A 500ms one-shot sleep. The producer slows down but doesn’t stop. Lag is elevated but not critical — insert a brief pause and continue. The consumer has time to catch up without the producer halting completely.

Hard limit (lag_pause, default 200): A spin-wait loop polling every 100ms until lag drops back below the threshold. The producer is fully stopped, but it polls frequently so it resumes as soon as the consumer catches up. Both delay loops check $shouldStop so a SIGTERM during a spin-wait exits cleanly rather than hanging.

The result is a graduated response: slightly elevated lag gets a nudge; seriously elevated lag gets a hard stop. The sawtooth pattern of the binary guard flattens into something closer to a steady state.

All four values live in config/sentinel.php under backpressure.* and are overridable via environment variables — no code change needed to tune the thresholds once the system is running under real load.

A signal you can’t see is hard to reason about. The consumer lag key was always there; the dashboard widget just surfaces it.

The LagCard component reads consumer_lag from the props Inertia passes in on page load and colours it by zone:

function LagCard({ lag, warn, pause }) {
    let valueClass = 'text-emerald-400';
    let status     = 'nominal';

    if (lag === null) {
        valueClass = 'text-slate-500';
        status     = 'worker off';
    } else if (lag >= pause) {
        valueClass = 'text-red-400';
        status     = 'backpressure';
    } else if (lag >= warn) {
        valueClass = 'text-amber-400';
        status     = 'elevated';
    }
    // ...
}

Green (nominal) → amber (elevated, soft limit in range) → red (backpressure, hard limit hit). A dash with “worker off” when the key has expired — which tells you immediately whether the worker is running at all. The thresholds shown in the UI (warn and pause) are passed from the controller so they stay in sync with config/sentinel.php; the widget doesn’t hardcode the numbers.

This makes the backpressure system observable. You can watch the lag number climb when a slow Gemini call backs things up, see the producer log a warning, and watch the number settle back into green as the AI responses come in.

The Anti-Pattern This Avoids

The binary on/off pattern is a stop/go traffic light. It creates bursty throughput by design: the producer runs flat-out until it hits the wall, then stops completely, then runs flat-out again. The consumer gets flooded and then starved in alternating cycles. Real-world TCP slow-start works the opposite way — it probes for capacity and backs off gradually rather than oscillating between zero and maximum.

The two-threshold approach approximates that smoother curve with minimal complexity. It doesn’t require anything fancy — just two if branches and a sleep. The tradeoff is that the soft sleep (500ms) is a fixed delay rather than a proportional one. If lag is 51 and the warning threshold is 50, the producer sleeps 500ms regardless. A proportional sleep (lag / lag_warn * sleep_ms) would be more accurate but harder to reason about under load. Fixed thresholds are simple to tune and predictable to debug.

Q: Why does the lag key expire in 10 seconds rather than some longer window? A: The worker writes it after every cycle. Under normal operation it refreshes several times per 10-second window. The TTL needs to be long enough that a single slow Gemini call (up to ~8s) doesn’t let the key expire mid-cycle and falsely signal zero lag. 10s gives that margin. If it were 60s, a crashed worker would leave a stale high-lag value blocking the producer for up to a minute after recovery — which is worse than the alternative (producer publishes freely while the XAUTOCLAIM path recovers stranded messages).

Q: Why not use a stream-based backpressure signal instead of a Redis key? A: A second stream where the consumer publishes “ready” signals was considered (ADR-0023, Option C). It adds a coordination pattern with two more streams and more moving parts — and provides no meaningful advantage at current scale. A single GET/SET key is easier to reason about, cheaper, and the TTL-based expiry handles the dead-consumer case more cleanly than a second stream ever would.

Q: The widget reads lag on page load. Is it real-time? A: No — it reflects the lag at the time the page was last loaded or refreshed. Making it real-time would require either a polling interval or a WebSocket push. For a portfolio system, page-load freshness is enough. The Inertia router.reload() pattern would make it auto-refresh if that became a requirement.

Graduated Backpressure: Moving Beyond Binary Pauses

Why XLEN Was the Wrong Signal

How the Lag Signal Works

Two Thresholds, Two Responses

Making Lag Visible: The Dashboard Widget

The Anti-Pattern This Avoids