Three AI providers went down on the same day. Here's the architecture that didn't care.
On June 2, 2026, Claude, ChatGPT, and Grok all had outages in the same window. If your app calls one provider directly, your app went down too. Why single-vendor reliance is an architecture problem — and what health-weighted, cross-provider failover actually looks like.
On June 2, 2026, Claude, ChatGPT, and Grok all had outages inside the same window. Anthropic's status page showed a fix deployed by 10:42 UTC; the others recovered around the same stretch. For a lot of teams, that meant their own product was down — not because of anything in their code, but because they had wired their uptime to a single vendor's status page.
It's tempting to file this under "vendor problem." Anthropic was down. OpenAI was down. Bad day for them. But that framing is the trap, and it's worth saying plainly:
Single-vendor reliance on an LLM provider is an architecture problem, not a "which provider is reliable" problem.
Every major model provider has had an outage this year. There is no "reliable one" to switch to. If your answer to yesterday is "we should move to provider X," you've just picked a different status page to be hostage to. The teams that didn't feel June 2 weren't on a better provider — they had a different shape.
The shape that survives
The setup that shrugged off yesterday is a gateway sitting in front of multiple providers, with failover that reroutes a failing request to an equivalent-capability model on a provider that's still up. One provider 5xxs or times out, the request quietly lands somewhere else, and the user never sees it.
The naive version of this is a try/except that falls back from GPT to Claude. That mostly works until it doesn't — you fail over from a frontier model to a tiny one, or you hammer a provider that's already degraded, or you fail over to the provider that's actually down. Doing it well takes three pieces that aren't obvious until you've been paged for them.
1. Capability-bucket failover, not a hard-coded model map. You don't want "if GPT-5.4 fails, try Claude Opus." You want "this request needs a large reasoning model; here are the large reasoning models across every provider I hold a key for; route to a healthy one." We bucket the catalog into capability tiers — small / medium / large / frontier / code / reasoning / long-context — and fail over within the bucket, so the replacement is genuinely equivalent and you're not silently downgrading quality during an incident. (This replaced an O(N²) explicit model-to-model fallback map that got unmaintainable the moment we passed a handful of models.)
2. Health-weighted routing, so you stop sending traffic to a sinking provider. Failover that retries a dead provider on every request just turns one provider's outage into your latency spike. We keep a rolling window of each provider's recent success rate in Redis and weight routing by it: a provider with no recent history starts at full weight, a healthy one (≥95% success) stays at full weight, one that's degrading (≥50%) drops to a tenth of its weight, and one that's clearly down (<50%) drops to zero and gets skipped entirely until it recovers. The system routes around the outage instead of into it.
3. Optional hedging for the requests that can't wait. For latency-critical calls, racing two providers in parallel and taking the first to respond (cancelling the loser) turns a p99 tail — including a provider mid-wobble — into a p50. It costs roughly 1.3× tokens on the hedged calls, so it's a knob you turn on for the traffic that warrants it, not a default.
None of this is exotic. It's the boring infrastructure that the word "gateway" should imply but usually doesn't. We wrote up a concrete instance of it — routing around a 20-minute Anthropic outage — if you want the play-by-play.
The honest caveats
I build Prism (an OpenAI-compatible gateway that does the above), so take the framing with the appropriate grain of salt. And let me be honest about the limits, because over-claiming reliability is its own failure mode:
- A gateway is not magic. If you route every request to a single provider through a gateway, you've added a hop and kept your single point of failure. The win is failover across several providers you've actually wired up — not the gateway itself.
- A gateway is a dependency too. Ours runs its origin in a single region (Mumbai) today, fronted by a global edge. Cross-provider failover protects you from a provider outage; it does not make us, or any gateway, immune to our own. Anyone who tells you their proxy gives you 100% uptime is selling you something.
- Equivalent isn't identical. Failing over from one frontier model to another keeps you up, but the replacement will have its own quirks. For most production traffic that's a fine trade against being down; for output that's tightly tuned to one model, test it.
This is the same lesson the whole industry is learning
The reliability angle is the visceral one this week, but it rhymes with the cost angle. The same day as the outages, Microsoft unveiled in-house models at Build explicitly "to lessen reliance on OpenAI and lower costs." DeepSeek V4 is selling flagship-class output at $0.86 per million tokens — roughly 28× cheaper than the frontier incumbents at near-parity on coding benchmarks — and taking share precisely because teams want an exit from any single provider's pricing.
Uptime and cost are the same story told twice: don't bet your product on a single AI provider. Yesterday just made the reliability half hard to ignore.
So what should you actually do?
- If you're a hobby project or pre-traffic: you don't need this yet. Call one provider directly and move on. Premature failover is its own complexity tax.
- If you have real users and a real bill: put a gateway with genuine cross-provider, health-weighted, capability-bucketed failover between your app and the providers — buy it or build it, but build it properly if you build it. The
try/exceptversion will let you down on exactly the day you need it. - If you want to measure it before committing: Prism is OpenAI-compatible, so trying it is a base-URL change, and you can bring your own provider keys at zero markup — your keys, your bill, failover and caching layered on top. Point it at the providers you already pay for and see what the next outage feels like from behind it.
Don't let one provider's bad day be your bad day. There will be another one.
— Ravi Patel, founder, Prism by Ssimplifi