AI gateway comparison

AI gateway comparison 2026: Prism vs Portkey vs Helicone vs LiteLLM vs OpenRouter vs Cloudflare AI Gateway

Last updated:

· 15 min read

The honest 2026 comparison across the seven AI gateways developers actually evaluate — feature matrix, pricing, when each wins, the real engineering tradeoffs.

The AI gateway market in 2026 split into seven serious products, each leading with a different wedge: Prism leads with cost engineering (3-layer caching + edge replication + measured savings); Portkey leads with governance + guardrails; Helicone leads with deep observability; LiteLLM leads with open-source + model breadth (100+ LLMs, MIT-licensed, 48k GitHub stars); OpenRouter leads with the credit marketplace (400+ models, no subscription); Cloudflare AI Gateway leads with free + edge-native; Langfuse and LangSmith lead with observability + evaluation at an adjacent layer. This guide is the comprehensive cross-comparison — feature matrix across all seven, when each wins, what they actually cost at production scale, and the engineering tradeoffs that drive the decision. Written for engineering leaders making a buy decision with a quarter of runway to course-correct if they pick wrong.

How AI gateways are categorised

Before comparing seven products, it helps to know what they're for. An AI gateway is a proxy that sits between application code and the underlying LLM providers (OpenAI, Anthropic, Google, etc.), centralising concerns that don't belong in the application:

  • Routing — pick which provider/model handles each request
  • Caching — avoid repeated provider calls when responses can be reused
  • Observability — capture usage, cost, latency, errors per request
  • Governance — budgets, policy rules, audit logs, attribution
  • Failover + reliability — retry against alternate providers when one fails
  • Cost engineering — the discipline of making the bill smaller through caching + routing + provider-native passthrough

Every AI gateway implements some subset of these. Where they differ is which subset they lead with and how deeply each capability is engineered. A gateway that lists "caching" as a feature is qualitatively different from a gateway whose entire product is built around caching.

The seven products

1. Prism — cost engineering wedge

Managed SaaS. 3-layer response caching (exact + semantic + provider-native passthrough), edge KV replication via Cloudflare Workers, per-request savings headers, INR billing for Indian customers, first-party CLI + MCP server, per-project policy + budget governance + audit log. Pro $19/mo, Team $49/mo, with a free tier (50K input tokens/day) and BYOK + Pro-feature unlock arriving in v1.9. Curated catalog: 23 models across 8 direct provider integrations.

Lead with: measurable savings on every request, visible in X-Prism-Cache-Saved-Cents response headers and a live public counter on the landing page.

2. Portkey — governance + guardrails wedge

Managed SaaS. PII redaction, content filters, prompt management (templates + versioning), RBAC, semantic caching, broad provider catalog (1,600+ models across 50+ providers per their docs). Free Developer tier (10K logs/month gateway works past cap), Production $49/month (100K logs/month, 30-day retention, semantic cache, guardrails, RBAC), Enterprise custom-priced.

Lead with: "AI control plane" framing for compliance-aware enterprises. Strong on PII redaction + multi-team governance.

3. Helicone — observability wedge

Managed SaaS with self-hostable proxy code on GitHub. Deep request-level logging, prompt experiments, custom properties, OpenAI-compatible proxy. Free tier with paid scaling. Self-hostable for teams with strict data-residency requirements.

Lead with: "let me see what's happening in production" — request logs, cost dashboards, prompt iteration tooling.

4. LiteLLM — open-source + breadth wedge

Open-source (MIT-licensed), 48k+ GitHub stars. 100+ LLMs across major providers in OpenAI format. Self-hostable via Docker, Helm, pip. Caching available in OSS proxy (in-memory, Redis, Qdrant semantic, Redis semantic, S3, GCS). Enterprise tier (custom-priced) adds SSO, SAML, JWT auth, audit logs, custom SLAs, managed cloud.

Lead with: "the OSS default" — broad model catalog, self-hostable, community-maintained.

5. OpenRouter — marketplace wedge

Credit marketplace gateway. 400+ models across 60+ providers via unified credits. No subscription — buy credits ($10-$99+ at signup), spend across any model. Multi-model synthesis ("Fusion") launched as a labs feature in March 2026. Custom data policies for routing constraints.

Lead with: "Better prices, better uptime, no subscriptions" — model breadth + unified credit balance + pay-as-you-go.

6. Cloudflare AI Gateway — free + edge wedge

Free utility layer included with any Cloudflare account. Edge-native (every PoP is the gateway). Major providers including Workers AI, Anthropic, Google Gemini, OpenAI, Replicate. Caching, analytics, logging, rate limiting bundled. No separate AI Gateway subscription.

Lead with: "observe and control your AI applications" — free, integrated with broader Cloudflare platform, edge-native by design.

7. Langfuse + LangSmith — observability platforms (different layer)

Langfuse is open-source (Hobby free, Core $29/mo, Enterprise $2,499+/mo, SOC 2 + ISO27001 + HIPAA Pro+). LangSmith is LangChain's commercial offering (Developer $0, Plus $39/seat/mo, Enterprise custom).

Note: these are not direct AI-gateway competitors — they're observability + evaluation platforms that sit parallel to the request path (instrumenting the application's LLM calls) rather than inline (proxying the calls). Most teams that need rich agent observability run one of these alongside a separate gateway. Included in this comparison because they're frequently considered in the same evaluation conversation.

The cross-product matrix

The honest at-a-glance comparison. Each row maps to a capability; each column maps to a gateway. Filled with "✓" for a primary feature, "✓ with X" for partial / configurable, and "—" for not offered.

Capability Prism Portkey Helicone LiteLLM OpenRouter CF AI Gateway Langfuse / LangSmith
Inline proxy (in request path) — (parallel observability)
OpenAI-compatible endpoint n/a (instrumentation, not endpoint)
Multi-provider routing ✓ (curated 23 models / 8 providers) ✓ (1,600+ / 50+) ✓ (100+) ✓ (400+ / 60+) ✓ (major providers)
Exact-match caching ✓ Pro ✓ Redis/in-memory/disk/S3/GCS
Semantic caching ✓ default ✓ Production+ — surfaced ✓ Qdrant/Redis Semantic
Provider-native passthrough ✓ pass-through to customer — surfaced — surfaced Depends on self-host wiring — surfaced — surfaced
Per-request savings header ✓ X-Prism-Cache-Saved-Cents
Public live-savings counter
Edge replication ✓ Cloudflare KV — (centralized) Self-host any region Edge model serving ✓ native
Speculative parallel routing ✓ Sport mode Pro+
Per-project budget caps (soft-warn + hard-block) ✓ Team tier ✓ per-key Custom data policies Rate limiting
Per-project policy (denied models/modes) ✓ Pro+ ✓ custom data policies
Audit log (append-only) ✓ Pro 30d / Team 365d ✓ enterprise tier ✓ enterprise tier Logging only ✓ Enterprise
Multi-model synthesis (fusion) ✓ v1.7-B (gated off pending activation) ✓ Labs (Fusion)
Prompt management / templates ✓ experiments
PII redaction / content filters ✓ strong Enterprise tier
First-party CLI ssimplifi-cli litellm proxy Cloudflare wrangler
First-party MCP server ssimplifi-prism-mcp
Open-source self-host Partial (some components) ✓ proxy code ✓ MIT n/a (Cloudflare-native) ✓ Langfuse OSS
INR billing rail (Razorpay)
Free tier (managed) ✓ 50K tokens/day ✓ 10K logs/mo ✓ usage limits n/a (self-host OSS) — (credits required) ✓ free on all plans ✓ Hobby (Langfuse)
Entry paid tier $19/mo Pro $49/mo Production Paid scaling Enterprise custom Pay-per-token credits Free $29/mo Core (Langfuse), $39/seat (LangSmith)
SOC 2 / SSO certifications — (audit 2026 H2) ✓ enterprise ✓ Helicone-cloud ✓ Enterprise ✓ Cloudflare native ✓ Langfuse Pro+

The matrix above is what an engineering team uses to narrow down to 2-3 candidates; the deeper choice happens via dedicated comparison pages.

How to actually pick

The matrix surface decisions; the real question is which 2-3 products to evaluate seriously. Here's the decision framework, ordered by which question dominates:

"I want the bill to be smaller"

Top picks: Prism, then LiteLLM (self-hosted), then Cloudflare AI Gateway.

Prism leads on this dimension by design — 3-layer caching, edge replication, provider-native passthrough as customer-passed savings, public savings counter. LiteLLM self-hosted with caching configured can replicate much of this with engineering work. Cloudflare AI Gateway has caching but doesn't lead with savings as the wedge.

Skip: Portkey (governance, not cost), Helicone (observability, not cost), OpenRouter (marketplace, not engineering), Langfuse/LangSmith (different layer).

Full Prism vs Portkey · Prism vs LiteLLM · Prism vs Cloudflare AI Gateway

"I need governance, audit, compliance"

Top picks: Portkey, then Prism (Team tier), then LiteLLM Enterprise.

Portkey leads on this dimension — PII redaction, content filters, RBAC, prompt management with versioning, multi-team controls. Prism Team adds per-project budget caps + denied models + audit log but lacks PII redaction. LiteLLM Enterprise covers the compliance surface in their paid tier.

If SOC 2 / SSO / SAML certifications are required today, Portkey or Cloudflare are the answer; Prism's SOC 2 audit is 2026 H2 roadmap.

Prism vs Portkey

"I want deep observability + agent debugging"

Top picks: LangSmith, then Langfuse (open-source alternative), then Helicone.

LangSmith and Langfuse are purpose-built for observability and evaluation — span-level tracing, dataset experiments, LLM-as-judge scoring, prompt-version A/B testing. Helicone is observability-first as a gateway. Combine with a gateway (Prism, Portkey, LiteLLM) for the inline concerns.

Prism vs Helicone · Prism vs Langfuse · Prism vs LangSmith

"I need maximum model breadth"

Top picks: OpenRouter (400+ models), then LiteLLM (100+ models).

OpenRouter's marketplace model is purpose-built for breadth + unified credit billing. LiteLLM's open-source catalog covers most of the same surface with self-host operation. Prism's curated 23-model catalog covers the most-used models but won't suit teams calling exotic providers.

Prism vs OpenRouter · Prism vs LiteLLM

"I want free / self-host"

Top picks: LiteLLM (MIT, self-host), then Cloudflare AI Gateway (free on Cloudflare), then Helicone (self-hostable proxy).

LiteLLM is the canonical OSS choice — MIT-licensed, well-documented, large community. Cloudflare AI Gateway is free with your existing Cloudflare account, no separate signup. Helicone publishes proxy code on GitHub for self-hosting.

Prism vs LiteLLM · Prism vs Cloudflare AI Gateway

"I'm on the Indian market"

Top picks: Prism (INR billing via Razorpay), then any USD-only competitor.

Prism is the only gateway in this comparison with native INR billing for Indian-resident customers (₹1,500 Pro / ₹3,900 Team via Razorpay). All other gateways are USD-only, which creates real friction under RBI/FEMA rules. If you're paying with an Indian card, Prism removes the cross-border conversion step.

What this comparison doesn't cover

A few categories of AI infrastructure that aren't in this guide because they live at different layers:

  • Direct provider APIs. Calling OpenAI / Anthropic / Google directly with no gateway is always an option. The case for any gateway is the centralised concerns above; if you don't need them, direct calls work fine.
  • Inference infrastructure (vLLM, TGI, Ollama). These are self-hosted runtimes for serving open-weights models. Not gateways; the gateways above can proxy in front of them.
  • Embedding-specific services. Cohere, Voyage, Jina. Most gateways focus on chat completions; embedding-only workloads have their own ecosystem.

Migration paths between gateways

Most gateways are OpenAI-compatible, so switching is structurally a base-URL + API-key change. The friction lives in:

  • Gateway-specific extensions (custom headers for routing, observability, caching) don't port cleanly. Code that depends on X-Prism-Mode or x-portkey-virtual-key needs adjustment.
  • Prompt-management content locked in a vendor's prompt library has to be migrated manually.
  • Dashboard-side configuration (budget rules, policy, alerts) has to be re-created in the destination platform.
  • Billing relationships (managed-billing balance vs marketplace credits vs OSS self-host) shift with the gateway choice — plan for it.

A typical migration between two managed gateways is 1-2 days of integration + 1 week of soak-testing. Between a self-hosted gateway and a managed one (either direction) it's longer because the infrastructure-operating discipline transfers either onto or off of your team.

Decision framework

If you're actively picking right now:

  1. Identify the dominant concern. Cost? Governance? Observability? Breadth? Free/self-host? Geography? The matrix above has the answer.
  2. Narrow to 2-3 candidates. Don't evaluate all seven seriously; pick the top picks from your dominant concern's row.
  3. Run a 1-week pilot on each. Real traffic, real workloads, real numbers. Vendor pitches don't survive contact with your actual prompt distribution.
  4. Measure: cost reduction, p99 latency, time-to-debug-an-incident. These three are the production signals that matter.
  5. Pick the one that fits your team's engineering style. Self-host suits some teams; managed suits others. The "best" product is the one your team will actually operate well.

The AI gateway space is healthy in 2026 — multiple credible products with different wedges. The decision rotates on what you optimise for, not on which product is "best."

Where to go next

For per-pairing depth, the dedicated comparison pages cover migration code, pricing math, and per-claim verification:

For the foundational pillars: AI API caching (the cost-engineering wedge), LLM budget governance (the FinOps surface), multi-region LLM API (the edge story).

For modelling your own workload: savings calculator.


Frequently asked questions

Which AI gateway is best in 2026?

There isn't one. The market has split into seven serious products, each leading with a different wedge. Pick based on your dominant concern: Prism for cost engineering, Portkey for governance, Helicone for observability, LiteLLM for OSS + breadth, OpenRouter for marketplace breadth, Cloudflare AI Gateway for free + edge-native, Langfuse/LangSmith for observability at an adjacent layer.

Can I run multiple gateways simultaneously?

Yes. The common combinations: a primary gateway (Prism, Portkey, LiteLLM) for production traffic + Langfuse or LangSmith in parallel for agent observability. Or different gateways for different projects depending on each workload's dominant concern. The two-layer pattern (inline gateway + parallel observability platform) is the canonical mature setup.

How does the pricing actually compare at scale?

Workload-dependent. At $1-5K/month LLM spend: Prism Pro ($19) and similar managed entry tiers are typically cheaper than the engineering hours to operate LiteLLM well. At $5-20K/month: the choice depends on cache-hit-rate, team size, governance needs. At $20K+/month with dedicated SRE capacity: self-host (LiteLLM) often wins on raw token cost; managed gateways win on operational simplicity + feature depth. The crossover varies per team.

Are these gateways OpenAI-compatible?

Yes, all of them expose an OpenAI-compatible Chat Completions endpoint. Switching between them is a base-URL + key change for the core API. Gateway-specific extensions (custom headers) need re-mapping during migration.

Which gateway is fastest?

Cache-hit latency: Cloudflare AI Gateway and Prism (edge-replicated) are roughly equivalent in the sub-200ms range. Centralised gateways (Portkey, Helicone, OpenRouter) pay the central-region round-trip distance. Cache-miss latency is dominated by the provider call itself and is similar across all gateways.

Are any of these going away?

The AI gateway market is well-funded and growing. The seven products covered here are all backed by serious investment (or by free Cloudflare integration in the case of CF AI Gateway). Consolidation is possible but the differentiation between wedges is real enough that none of them seem at immediate risk.

What about Vercel AI Gateway?

Vercel announced AI Gateway features but the surface in 2026 is positioned more as a developer-tooling layer (closely integrated with Next.js + the broader Vercel platform) than as a standalone AI gateway product. For Vercel-native applications it's a reasonable choice; for the broader gateway evaluation it doesn't differentiate enough yet to warrant a dedicated comparison page.

What about home-grown gateways?

Many companies build their own at scale. The case for a custom build: specific compliance, data-residency, or vendor-relationship requirements that don't fit any managed product. The case against: 6-12 months of engineering work plus ongoing operational burden, against a managed gateway that ships the same surface in days. Most teams that start home-grown end up regretting it; teams that succeed are usually well past $20K/month spend with dedicated platform engineering.


If you're narrowing your choice, the pairwise comparison pages above have the per-vendor depth + migration code samples. The AI API caching guide and LLM budget governance guide cover the foundational disciplines orthogonal to gateway choice.

Deep dives on ai gateway comparison

Five cluster posts unpack the sub-topics of this pillar. Each ships independently as part of the content calendar.

See your savings before you sign up

Run our calculator on your own workload. Real provider rates, real cache math, no email gate.

Frequently asked questions

What is ai gateway comparison?
Side-by-side feature matrix for every major AI gateway in 2026. Prism covers this topic from the perspective of an AI API proxy that ships measured production data on every request — not vendor estimates.
How does Prism handle ai gateway comparison?
Prism is an OpenAI-compatible AI API proxy that addresses ai gateway comparison directly. See the deep-dive posts in this guide for the per-sub-topic implementation details, or jump to the savings calculator to model the impact on your workload.