Multi-provider failover
Automatically routing a request to a backup AI provider when the primary returns an error or times out.
How it works
Multi-provider failoveris the reliability mechanism by which an AI gateway, when its chosen provider fails or times out, automatically retries the request against a different provider that hosts an equivalent model. The failure is invisible to the caller — they get a successful response from a provider they didn't pick. The mechanic addresses provider outages, capacity issues, and transient errors that would otherwise propagate to the application as 5xx responses.
The simplest pattern: define a primary provider and a sequence of fallback providers per model class. On a failure (5xx response, timeout, connection error), dispatch to the next provider in the sequence. Repeat until success or until the fallback chain is exhausted. Modern gateways add provider health monitoring — recent failure rates per provider are tracked in a rolling Redis window, and unhealthy providers are skipped over rather than retried into.
Failover vs routing
Failover and routing are adjacent but distinct concepts. Routing is "which model should this request go to" — proactive selection on every request based on intent, task type, mode, or policy. Failover is "the model I sent it to is unhealthy, send it somewhere else" — reactive recovery after a failed attempt. A production AI gateway needs both: routing picks the primary model, failover handles the case where the primary doesn't respond cleanly.
The failover chain
The structural decision is what counts as an "equivalent model" for failover purposes. Three patterns:
Capability-tier matching. Models are grouped into capability buckets (small / medium / large / frontier). On failover, the gateway picks a model in the same bucket from a different provider. A Claude Sonnet failure failovers to GPT-4o; a GPT-4o-mini failure to Claude Haiku. This is what Prism uses (v1.5 hardening pillar) — a 6-bucket index keyed in `router.MODEL_CAPABILITY`.
Fixed equivalent mapping. Each model has a hand-coded fallback equivalent. Less flexible than capability-tier matching but easier to reason about for small catalogs.
No model swap, just retry. Fallback to a different provider hosting the same model (where multiple providers offer the same open-weights model). Common in OpenRouter-style aggregation across providers offering Llama or DeepSeek deployments.
Streaming failover
The hard edge case. If a streaming response fails mid-stream (the provider drops the connection after returning partial tokens), failover is operationally complex — the gateway has to decide whether to abort the original stream, start a fresh stream on the fallback provider, and how to communicate the change to the caller. Most production gateways skip mid-stream failover and instead fail clean to the caller, who can retry. Failover on non-streaming responses is the well-defined path.
Failover vs speculative parallel routing
A more aggressive pattern: speculative parallel routing fires the primary and the first fallback simultaneously on every request, returns whichever finishes first, cancels the loser. Costs ~1.3x token spend in exchange for p99 latency hedging. Different mechanic from failover (which is sequential). See speculative-routing for the deeper dive. Prism runs speculative on sport-mode requests for Pro+ accounts; failover applies to all paid traffic regardless of mode.