Question 1

What's the difference between edge inference and edge routing?

Accepted Answer

Edge inference runs the model itself at edge PoPs — the inference happens close to the user. Edge routing puts only the proxy logic (auth, caching, request shaping) at the edge and forwards cache-miss requests to a central inference origin. Edge inference has lower latency on cache misses; edge routing has broader model availability (it can route to any foundation model). For workloads using GPT-4o, Claude Sonnet 4, Gemini Pro, edge inference isn't an option — those models only run in central clouds — so edge routing is the practical answer.

Question 2

Why isn't GPT-4o or Claude available for edge inference?

Accepted Answer

Both are closed-weight foundation models that the providers don't license for edge deployment. The providers run them in their own central GPU clusters and expose them only through their APIs. Open-weight models (Llama, Mistral, Qwen) can run at the edge because the weights are distributable; the proprietary models cannot.

Question 3

Does Prism do edge inference?

Accepted Answer

No — Prism does edge routing. The proxy layer runs at Cloudflare's 300+ edge PoPs (auth, cache lookup, classification), but cache-miss requests are forwarded to Mumbai for the actual inference call. This matches what 95%+ of production workloads need because the models that matter are foundation models running in central clouds.

Question 4

When will edge inference be practical for big foundation models?

Accepted Answer

Probably 2027-2028 at scale, gated on (a) provider licensing — Anthropic or OpenAI letting Cloudflare/Vercel/etc. host their weights — and (b) the open-weight gap closing further. The Llama 4 / Qwen 3 / Mistral Large class of models is approaching GPT-4-mini quality and can run at the edge today; the gap with foundation models above that is real but shrinking.

Edge inference

How it works

When it matters

The practical landscape

See your savings before you sign up

Frequently asked questions

Related reading

All glossary terms

Read the guides