Question 1

Is an AI gateway the same as an LLM proxy or an LLM router?

Accepted Answer

Approximately yes — the three terms are used interchangeably in practice. 'AI gateway' is the most common vendor-marketing term; 'LLM proxy' is the most technically precise; 'LLM router' tends to emphasize the model-selection function specifically. They all describe the same architectural pattern: a proxy between application code and AI providers, centralizing cross-cutting concerns.

Question 2

Why not just call OpenAI's SDK directly?

Accepted Answer

If you're single-provider with low spend and don't need caching, observability, or governance — you should. The SDK is simpler. AI gateways become worth deploying when you cross into multi-provider, multi-team, or cost-engineering territory; before then they're added overhead without the matching value.

Question 3

How much latency does an AI gateway add?

Accepted Answer

Depends on the gateway. A well-designed gateway adds 5-30ms when it's just proxying, and can SAVE latency on cache hits (a cache-hit response can be 100-300ms instead of 1-3 seconds). Edge-deployed gateways (Cloudflare AI Gateway, Prism's edge layer) can reduce the gateway's own overhead to ~5ms by running the proxy logic at the customer's nearest data center.

Question 4

Are AI gateways the same as API gateways?

Accepted Answer

Same architectural pattern (proxy + cross-cutting concerns); different cross-cutting concerns. API gateways handle authentication, rate limiting, request transformation, and routing across microservices. AI gateways handle model selection, response caching, token cost attribution, and provider failover — concerns specific to LLM traffic. An API gateway in front of LLM APIs would solve some problems but miss the LLM-specific ones.

AI gateway

How it works

When it matters

What an AI gateway is not

The category today

See your savings before you sign up

Frequently asked questions

Related reading

All glossary terms

Read the guides