Last updated:

AI gateway

A proxy that sits between applications and AI providers, handling routing, caching, observability, and governance for LLM API traffic.

How it works

An AI gateway is a proxy that sits between application code and AI providers (Anthropic, OpenAI, Google, and self-hosted models), centralizing the cross-cutting concerns that every LLM-calling application eventually needs: authentication, request routing, response caching, observability, failover, governance, and cost attribution. Applications call the gateway as if it were the underlying provider; the gateway handles provider selection, retries, caching, and logging before forwarding to the actual AI.

The architectural pattern is identical to the API gateway pattern in service-oriented architectures (Kong, Tyk, Cloudflare API Gateway), adapted for the specific shape of LLM traffic. The differences are in the cross-cutting concerns the LLM context surfaces: routing decisions are usually about model choice (cost vs quality) rather than service discovery; caching is multi-layer (exact, semantic, provider-native) rather than HTTP-cache-control; observability includes tokens-and-cost dimensions that don't exist in HTTP traffic.

When it matters

An AI gateway becomes worth deploying when at least one of these is true: (a) the application calls more than one AI provider, and provider-specific SDKs are creating integration drift; (b) AI spend is non-trivial and cost engineering (caching, model selection, budget caps) is worth the gateway's overhead; (c) compliance or governance requires auditable spend + policy enforcement that the provider SDKs don't offer; (d) latency or reliability matters and you need automatic failover between providers.

When it doesn't matter: single-provider applications with low spend, where calling OpenAI's SDK directly is the simpler architecture. The gateway adds ~5-30ms of overhead per request and one more component to deploy + monitor; that overhead is only worth it when the gateway is doing meaningful work.

What an AI gateway is not

Common confusions worth disambiguating: an AI gateway is not a vector store (those are different products entirely, though some gateways integrate with them); it's not a model-hosting platform (Replicate, Together, Fireworks fall in that bucket); it's not a model-evaluation framework (LangSmith, Braintrust, Promptfoo are eval-focused). Some products span multiple categories — Helicone is gateway + observability; LiteLLM is gateway + abstraction layer — but the core "gateway" role is the proxy + cross-cutting concerns layer specifically.

The category today

As of 2026 the AI gateway category includes Prism, Portkey, Helicone, LiteLLM (open source), OpenRouter (multi-provider marketplace shape), Cloudflare AI Gateway (edge-native), Vercel AI Gateway (platform-bundled), and several others. The differentiation across products is mostly on which cross-cutting concerns they emphasize — observability-first vs caching-first vs cost-first vs routing-first. Most products do all four to some degree; the question is which one they lead with.

See your savings before you sign up

Run our calculator on your own workload. Real provider rates, real cache math, no email gate.

Frequently asked questions

Is an AI gateway the same as an LLM proxy or an LLM router?
Approximately yes — the three terms are used interchangeably in practice. 'AI gateway' is the most common vendor-marketing term; 'LLM proxy' is the most technically precise; 'LLM router' tends to emphasize the model-selection function specifically. They all describe the same architectural pattern: a proxy between application code and AI providers, centralizing cross-cutting concerns.
Why not just call OpenAI's SDK directly?
If you're single-provider with low spend and don't need caching, observability, or governance — you should. The SDK is simpler. AI gateways become worth deploying when you cross into multi-provider, multi-team, or cost-engineering territory; before then they're added overhead without the matching value.
How much latency does an AI gateway add?
Depends on the gateway. A well-designed gateway adds 5-30ms when it's just proxying, and can SAVE latency on cache hits (a cache-hit response can be 100-300ms instead of 1-3 seconds). Edge-deployed gateways (Cloudflare AI Gateway, Prism's edge layer) can reduce the gateway's own overhead to ~5ms by running the proxy logic at the customer's nearest data center.
Are AI gateways the same as API gateways?
Same architectural pattern (proxy + cross-cutting concerns); different cross-cutting concerns. API gateways handle authentication, rate limiting, request transformation, and routing across microservices. AI gateways handle model selection, response caching, token cost attribution, and provider failover — concerns specific to LLM traffic. An API gateway in front of LLM APIs would solve some problems but miss the LLM-specific ones.