AI gateway
A proxy that sits between applications and AI providers, handling routing, caching, observability, and governance for LLM API traffic.
How it works
An AI gateway is a proxy that sits between application code and AI providers (Anthropic, OpenAI, Google, and self-hosted models), centralizing the cross-cutting concerns that every LLM-calling application eventually needs: authentication, request routing, response caching, observability, failover, governance, and cost attribution. Applications call the gateway as if it were the underlying provider; the gateway handles provider selection, retries, caching, and logging before forwarding to the actual AI.
The architectural pattern is identical to the API gateway pattern in service-oriented architectures (Kong, Tyk, Cloudflare API Gateway), adapted for the specific shape of LLM traffic. The differences are in the cross-cutting concerns the LLM context surfaces: routing decisions are usually about model choice (cost vs quality) rather than service discovery; caching is multi-layer (exact, semantic, provider-native) rather than HTTP-cache-control; observability includes tokens-and-cost dimensions that don't exist in HTTP traffic.
When it matters
An AI gateway becomes worth deploying when at least one of these is true: (a) the application calls more than one AI provider, and provider-specific SDKs are creating integration drift; (b) AI spend is non-trivial and cost engineering (caching, model selection, budget caps) is worth the gateway's overhead; (c) compliance or governance requires auditable spend + policy enforcement that the provider SDKs don't offer; (d) latency or reliability matters and you need automatic failover between providers.
When it doesn't matter: single-provider applications with low spend, where calling OpenAI's SDK directly is the simpler architecture. The gateway adds ~5-30ms of overhead per request and one more component to deploy + monitor; that overhead is only worth it when the gateway is doing meaningful work.
What an AI gateway is not
Common confusions worth disambiguating: an AI gateway is not a vector store (those are different products entirely, though some gateways integrate with them); it's not a model-hosting platform (Replicate, Together, Fireworks fall in that bucket); it's not a model-evaluation framework (LangSmith, Braintrust, Promptfoo are eval-focused). Some products span multiple categories — Helicone is gateway + observability; LiteLLM is gateway + abstraction layer — but the core "gateway" role is the proxy + cross-cutting concerns layer specifically.
The category today
As of 2026 the AI gateway category includes Prism, Portkey, Helicone, LiteLLM (open source), OpenRouter (multi-provider marketplace shape), Cloudflare AI Gateway (edge-native), Vercel AI Gateway (platform-bundled), and several others. The differentiation across products is mostly on which cross-cutting concerns they emphasize — observability-first vs caching-first vs cost-first vs routing-first. Most products do all four to some degree; the question is which one they lead with.