Question 1

Is speculative routing the same as multi-model synthesis / fusion mode?

Accepted Answer

No — different selection logic. Speculative fires N providers and takes the FIRST response (latency hedging). Fusion mode fires N providers and synthesises across ALL responses with a judge model (quality wedge). Same fan-out shape, different goal. Prism uses the same underlying dispatch primitive for both, with different selection logic on top.

Question 2

How does the loser's cancel work?

Accepted Answer

Best-effort, via asyncio task cancellation on the gateway side and HTTP connection close. The actual cancel takes a few hundred milliseconds to propagate over the HTTP connection; during that window the loser provider keeps generating tokens, which get billed. In practice this means ~1.3x effective token cost averaged across all speculative calls.

Question 3

Is there an outage scenario where speculative routing makes things worse?

Accepted Answer

If both providers are slow, speculative produces a slow response and pays for both calls. The cost is real but the impact is limited — if both providers are slow, the user was getting a slow response anyway; the speculative version just costs more. The case where speculative is unambiguously worse is on healthy traffic where the primary would have responded in 500ms and the speculative call costs 1.3x the tokens for no latency improvement.

Question 4

Why doesn't every paid request get speculative routing?

Accepted Answer

Cost. The 30% token overhead on every request is meaningful at scale — on a $5K/month bill that's $1,500/month in hedging cost. Eco and balanced modes typically don't need it (cheap models are fast); sport mode does (the customer already declared they prefer quality + speed over cost). Restricting speculative to sport-mode + Pro+ keeps the cost where the value is.

Speculative routing

How it works

What it buys you

What it costs

When it makes sense

Failover vs speculative routing

How Prism implements it

See your savings before you sign up

Frequently asked questions

Related reading

All glossary terms

Read the guides