The hop-loss gap we shipped in 24 hours

A competitor-adjacent founder publicly flagged an attribution gap in our edge cache layer. Here's exactly what was wrong, why it mattered, and the one-day commit that closed it — code paths included.

A founder building agentcolony.org/auditor/context — a diagnostic tool for "hop loss" in agent gateways — left a thoughtful comment on a dev.to comparison post we wrote. The question, paraphrased:

Does Prism's edge replication preserve request-context fields like workflow_id and conversation_id end-to-end, or does the downstream router rebuild them?

This is a sharp question. The "hop loss" pattern they're targeting is a well-known failure mode: a request enters at an upstream tagger with identifiers attached, gets parsed and forwarded by an intermediate hop, and arrives at the downstream writer where the identifiers either drift (two writers, different parsing) or disappear entirely (intermediate hop forgets to forward).

Their core claim is that most teams get stuck on per-tenant attribution because the fields don't survive the hop, so attribution ends up as "provider math, not request math." It's the right framing. We took it seriously and went to look at our own code.

This post is the audit + the fix. We shipped the fix the same day. Full commit included at the bottom.

What Prism actually does (the honest version)

We don't have first-class workflow_id or conversation_id fields by those names. We have two adjacent things:

session_id — client-supplied via X-Prism-Session header. Drives server-side conversation memory (Upstash Redis, 24h TTL) and lands on usage_logs.session_id text. This is our conversation-thread analogue.
request_tags — client-supplied via X-Prism-Tags: feature=onboarding,team=growth. Stored as usage_logs.request_tags jsonb. This is our per-feature / per-tenant attribution surface.

For non-cached requests (~75-90% of typical traffic), the hop architecture is intentionally minimal:

Cloudflare Worker (edge)              EC2 Mumbai (origin)
─────────────────────────             ──────────────────
Reads only:                           Reads: every request header
  Authorization                       Writes usage_logs ONCE:
  X-Prism-Mode                          - session_id (parsed from header)
  X-Prism-Model-Prefer                  - request_tags (parsed from header)
  (request body for cache lookup)       - project_id (from auth)
                                        - org_id (from auth)
Ignores X-Prism-Session,
        X-Prism-Tags

Forwards Headers object UNTOUCHED
to origin via passthrough()

Critical detail: the worker does not parse or re-interpret these identity headers. They ride along inside the un-mutated Headers object handed to fetch(originUrl, init). Mumbai is the only parser and the only writer to usage_logs. For the non-cached path, there is no second writer to drift against.

We verified this empirically: grep -in "usage_logs\|insert.*usage" workers/prism-edge/src/*.ts returns zero matches. The edge worker doesn't touch the table.

So far, so good.

The gap (and it was real)

Edge cache hits did not write a usage_logs row at all.

When the worker served a cached response from Workers KV or Upstash Redis, the customer got back X-Prism-Edge-Cache: hit directly from the PoP — the request never reached Mumbai. The only bookkeeping was recordEdgeHit(), which bumped three Redis hash counters: total hits, saved cents, per-colo distribution. Keyed by account_id + date only. No session_id. No request_tags. No per-project breakdown for the cached hit.

So a customer using X-Prism-Tags: feature=onboarding:

Origin-served request → row in usage_logs with request_tags.feature=onboarding → flows into per-feature attribution.
Edge-served cache hit → counter bump, no row → invisible to per-feature attribution.

That's not literally "the field drifts across hops" — it's the adjacent failure mode: the field disappears entirely for the cached slice, because we skipped the canonical write to keep edge hit latency under 100ms globally.

For workloads with 30-60% cache hit rate, the cached-at-edge slice is roughly 10-25% of total traffic. Per-feature attribution on the rest is accurate. On that slice: aggregate-only.

AgentColony's framing maps cleanly onto this real failure mode. We thanked them for the prompt.

The fix

One file, ~80 lines added, zero migrations, zero new dependencies. The patch lives at workers/prism-edge/src/index.ts.

What it does: when the worker serves a cache hit, after firing the existing Redis counter bumps, it also fires a usage_logs INSERT to Supabase via PostgREST. The row carries everything the origin would have written:

const row = {
  account_id:          auth.accountId,
  project_id:          auth.projectId,
  mode:                mode || "balanced",
  task_type:           "cache",        // sentinel — no routing happened
  model_used:          hit.model,
  provider:            "cache",
  tokens_in:           usage?.prompt_tokens ?? 0,
  tokens_out:          usage?.completion_tokens ?? 0,
  cost_provider_cents: 0,
  cost_total_cents:    0,
  latency_ms:          0,
  was_streaming:       false,
  success:             true,
  session_id:          request.headers.get("X-Prism-Session") || null,
  cache_status:        "hit-exact-edge",
  cache_saved_cents:   hit.savedCents || 0,
  request_tags:        parsePrismTags(request.headers.get("X-Prism-Tags")),
};

Three design choices worth calling out:

Fire-and-forget via ctx.waitUntil(). The customer's response already left the PoP before this INSERT begins. Zero added latency on the hot path. Cloudflare Workers' waitUntil budget is generous (30s soft); the INSERT typically completes in ~80ms.
5-second timeout cap via AbortSignal.timeout(5000). If Supabase is slow or unreachable, we abandon the row rather than block. The customer already got their cached response — losing the attribution row is preferable to leaving a half-open connection in the worker.
Tag-parsing discipline mirrors Mumbai's. We re-implemented parsePrismTags() in TypeScript using the same rules Mumbai uses in completions.py:
- max 10 keys (the rest dropped)
- max 64 chars per key/value (truncated, not rejected)
- empty key or empty value drops the pair
- returns null if nothing valid
This guarantees the row written from the edge matches byte-for-byte what Mumbai would have written for the same request. No drift surface.

We also moved this from "real gap on the candidate list" to closed in docs/competitive-gaps.md — opened-and-closed in the same week.

What this means in production

Three concrete things change for customers:

/dashboard/usage Requests tab now shows edge-cache hits as rows. Previously, a customer's request explorer skipped edge hits entirely. Now every cached request from any PoP appears as a row with provider=cache, cache_status=hit-exact-edge, and full tag attribution.
By-feature attribution covers 100% of traffic. The /dashboard/usage → By feature tab (Pro/Team) sums cost + savings + hit counts broken out by request_tags.feature. Before this patch, the cached slice was invisible. Now it's accurate.
Conversation accounting is exact. A multi-turn conversation that happens to hit cache on turn 3 will still have all three turns row-logged with the same session_id. Before, turn 3 disappeared from the session's row-history (still in conversation memory; just not in the audit table).

The fix is fully backwards compatible — no schema migration, no new columns, no API contract change. Customers using the SDKs see no difference except more accurate dashboards.

The honest framing of the larger architectural choice

Worth saying out loud: Prism's design philosophy is one writer to the canonical request log. The worker doesn't write usage_logs; Mumbai does. The only reason this patch exists is that edge cache hits are the one path where Mumbai never gets the chance.

This is deliberate. Two writers to the same table (edge writes its view, Mumbai writes its view, batch job reconciles them) is the architecture that produces hop-loss drift in the first place — exactly the failure mode AgentColony's tool diagnoses. We avoided it for the 75-90% of traffic that goes through Mumbai. The 10-25% cached slice still has one writer (the worker), but it writes once with the parsed identifiers, not after a parse-forward-reparse cycle.

If we ever add a second writer (e.g. a downstream consumer that wants to update the row), we'd need to think hard about which fields are owned by which writer. For now: every column on usage_logs is written by exactly one path, with exactly one parsing pass over the customer's headers. Drift surface remains zero by construction.

What we'd do differently next time

We should have caught this ourselves when the v1.6 edge cache shipped. The reason we didn't is honest: the dashboard's primary cache surface (the savings tile, the hit-rate chart) sums Redis counter data, so it looked correct on first inspection. The breakdown-by-feature tab was newer (v1.3 observability), and we didn't write the cross-feature regression that would have caught the missing rows.

Concrete process change: every new code path that produces customer-visible aggregate numbers gets a "where do the per-request rows come from?" check. If the answer is "they don't, we sum from counters," that's a flag — counters can be right while attribution is wrong.

Footnote — credit where due

If you're building an agent gateway and worried about hop loss in your own stack, AgentColony's Auditor / Context is the diagnostic tool designed for exactly this. We're not affiliated. The founder pinged us with a sharp question, we audited our own code, and we shipped a fix the same day — that's a stronger outcome than if they'd just shrugged.

The commit hash is in docs/competitive-gaps.md Gap #9 if you want to read the actual diff. PRs welcome on workers/prism-edge/ — we'd love help finding the next gap before someone else does.

Q&A

Did you really ship in 24 hours, or is this marketing? The commit timestamp on 5262889 and the dev.to comment timestamp are within a few hours of each other. The fix was small because the architecture was right — one writer, one parsing pass, no envelope juggling — and it took an audit pass to find the one path (edge cache hits) where the rule wasn't being followed. The code change itself was ~80 LOC. The honest answer: the fix took an hour; the audit took the rest of the day.

What about edge-cache hits before this patch — is that data lost forever? Yes — there's no way to reconstruct per-tag attribution for cache hits that happened before this commit went live. The Redis counters retained the aggregate totals (cache savings, hit counts per account) so the dashboard's top-line numbers are unaffected. Only the per-feature / per-session breakdown for the pre-patch cached slice is irrecoverable. Sorry about that — it's a real consequence of having gone aggregate-only for that path.

Why didn't you just have the worker write to a separate edge_hits table and reconcile later? That's the dual-writer pattern that creates the drift problem we're trying to avoid. One writer to usage_logs keeps the invariant clean. The worker writing one row per cache hit is the minimal change that gets us there without introducing a reconciliation surface.

Does this affect latency on the hot path? No. The customer's response is sent before the INSERT begins. The INSERT runs in ctx.waitUntil with a 5-second cap. Workers' execution model lets the response stream complete while background work continues; we measured no change in p50/p95 on the cache-hit path.

Will you do this for other competitor-flagged gaps? Where we can ship the fix in a day and the customer-visible win is real, yes. Where it's a strategic gap (SOC 2, open-source self-host, fusion-mode quality) the calculus is different — those take weeks or months and require deliberate sequencing. But the small, sharp, easy-to-validate ones: ship them and write about it.

Where can I see the actual code change? workers/prism-edge/src/index.ts — look for recordEdgeHitToUsageLogs and parsePrismTags. The diff is on the main branch of github.com/ravirdp/prism (private repo today; the API key lookups are how we authenticate the worker against Supabase). Commit 5262889.