AI gateway comparison 2026: Prism vs Portkey vs Helicone vs LiteLLM vs OpenRouter vs Cloudflare AI Gateway
Last updated:
· 15 min readThe honest 2026 comparison across the seven AI gateways developers actually evaluate — feature matrix, pricing, when each wins, the real engineering tradeoffs.
The AI gateway market in 2026 split into seven serious products, each leading with a different wedge: Prism leads with cost engineering (3-layer caching + edge replication + measured savings); Portkey leads with governance + guardrails; Helicone leads with deep observability; LiteLLM leads with open-source + model breadth (100+ LLMs, MIT-licensed, 48k GitHub stars); OpenRouter leads with the credit marketplace (400+ models, no subscription); Cloudflare AI Gateway leads with free + edge-native; Langfuse and LangSmith lead with observability + evaluation at an adjacent layer. This guide is the comprehensive cross-comparison — feature matrix across all seven, when each wins, what they actually cost at production scale, and the engineering tradeoffs that drive the decision. Written for engineering leaders making a buy decision with a quarter of runway to course-correct if they pick wrong.
How AI gateways are categorised
Before comparing seven products, it helps to know what they're for. An AI gateway is a proxy that sits between application code and the underlying LLM providers (OpenAI, Anthropic, Google, etc.), centralising concerns that don't belong in the application:
- Routing — pick which provider/model handles each request
- Caching — avoid repeated provider calls when responses can be reused
- Observability — capture usage, cost, latency, errors per request
- Governance — budgets, policy rules, audit logs, attribution
- Failover + reliability — retry against alternate providers when one fails
- Cost engineering — the discipline of making the bill smaller through caching + routing + provider-native passthrough
Every AI gateway implements some subset of these. Where they differ is which subset they lead with and how deeply each capability is engineered. A gateway that lists "caching" as a feature is qualitatively different from a gateway whose entire product is built around caching.
The seven products
1. Prism — cost engineering wedge
Managed SaaS. 3-layer response caching (exact + semantic + provider-native passthrough), edge KV replication via Cloudflare Workers, per-request savings headers, INR billing for Indian customers, first-party CLI + MCP server, per-project policy + budget governance + audit log. Pro $19/mo, Team $49/mo, with a free tier (50K input tokens/day) and BYOK + Pro-feature unlock arriving in v1.9. Curated catalog: 23 models across 8 direct provider integrations.
Lead with: measurable savings on every request, visible in X-Prism-Cache-Saved-Cents response headers and a live public counter on the landing page.
2. Portkey — governance + guardrails wedge
Managed SaaS. PII redaction, content filters, prompt management (templates + versioning), RBAC, semantic caching, broad provider catalog (1,600+ models across 50+ providers per their docs). Free Developer tier (10K logs/month gateway works past cap), Production $49/month (100K logs/month, 30-day retention, semantic cache, guardrails, RBAC), Enterprise custom-priced.
Lead with: "AI control plane" framing for compliance-aware enterprises. Strong on PII redaction + multi-team governance.
3. Helicone — observability wedge
Managed SaaS with self-hostable proxy code on GitHub. Deep request-level logging, prompt experiments, custom properties, OpenAI-compatible proxy. Free tier with paid scaling. Self-hostable for teams with strict data-residency requirements.
Lead with: "let me see what's happening in production" — request logs, cost dashboards, prompt iteration tooling.
4. LiteLLM — open-source + breadth wedge
Open-source (MIT-licensed), 48k+ GitHub stars. 100+ LLMs across major providers in OpenAI format. Self-hostable via Docker, Helm, pip. Caching available in OSS proxy (in-memory, Redis, Qdrant semantic, Redis semantic, S3, GCS). Enterprise tier (custom-priced) adds SSO, SAML, JWT auth, audit logs, custom SLAs, managed cloud.
Lead with: "the OSS default" — broad model catalog, self-hostable, community-maintained.
5. OpenRouter — marketplace wedge
Credit marketplace gateway. 400+ models across 60+ providers via unified credits. No subscription — buy credits ($10-$99+ at signup), spend across any model. Multi-model synthesis ("Fusion") launched as a labs feature in March 2026. Custom data policies for routing constraints.
Lead with: "Better prices, better uptime, no subscriptions" — model breadth + unified credit balance + pay-as-you-go.
6. Cloudflare AI Gateway — free + edge wedge
Free utility layer included with any Cloudflare account. Edge-native (every PoP is the gateway). Major providers including Workers AI, Anthropic, Google Gemini, OpenAI, Replicate. Caching, analytics, logging, rate limiting bundled. No separate AI Gateway subscription.
Lead with: "observe and control your AI applications" — free, integrated with broader Cloudflare platform, edge-native by design.
7. Langfuse + LangSmith — observability platforms (different layer)
Langfuse is open-source (Hobby free, Core $29/mo, Enterprise $2,499+/mo, SOC 2 + ISO27001 + HIPAA Pro+). LangSmith is LangChain's commercial offering (Developer $0, Plus $39/seat/mo, Enterprise custom).
Note: these are not direct AI-gateway competitors — they're observability + evaluation platforms that sit parallel to the request path (instrumenting the application's LLM calls) rather than inline (proxying the calls). Most teams that need rich agent observability run one of these alongside a separate gateway. Included in this comparison because they're frequently considered in the same evaluation conversation.
The cross-product matrix
The honest at-a-glance comparison. Each row maps to a capability; each column maps to a gateway. Filled with "✓" for a primary feature, "✓ with X" for partial / configurable, and "—" for not offered.
| Capability | Prism | Portkey | Helicone | LiteLLM | OpenRouter | CF AI Gateway | Langfuse / LangSmith |
|---|---|---|---|---|---|---|---|
| Inline proxy (in request path) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — (parallel observability) |
| OpenAI-compatible endpoint | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | n/a (instrumentation, not endpoint) |
| Multi-provider routing | ✓ (curated 23 models / 8 providers) | ✓ (1,600+ / 50+) | ✓ | ✓ (100+) | ✓ (400+ / 60+) | ✓ (major providers) | — |
| Exact-match caching | ✓ | ✓ | ✓ Pro | ✓ Redis/in-memory/disk/S3/GCS | — | ✓ | — |
| Semantic caching | ✓ default | ✓ Production+ | — surfaced | ✓ Qdrant/Redis Semantic | — | — | — |
| Provider-native passthrough | ✓ pass-through to customer | — surfaced | — surfaced | Depends on self-host wiring | — surfaced | — surfaced | — |
| Per-request savings header | ✓ X-Prism-Cache-Saved-Cents | — | — | — | — | — | — |
| Public live-savings counter | ✓ | — | — | — | — | — | — |
| Edge replication | ✓ Cloudflare KV | — (centralized) | — | Self-host any region | Edge model serving | ✓ native | — |
| Speculative parallel routing | ✓ Sport mode Pro+ | — | — | — | — | — | — |
| Per-project budget caps (soft-warn + hard-block) | ✓ Team tier | ✓ | ✓ | ✓ per-key | Custom data policies | Rate limiting | — |
| Per-project policy (denied models/modes) | ✓ Pro+ | ✓ | ✓ | ✓ | ✓ custom data policies | — | — |
| Audit log (append-only) | ✓ Pro 30d / Team 365d | ✓ enterprise tier | ✓ | ✓ enterprise tier | — | Logging only | ✓ Enterprise |
| Multi-model synthesis (fusion) | ✓ v1.7-B (gated off pending activation) | — | — | — | ✓ Labs (Fusion) | — | — |
| Prompt management / templates | — | ✓ | ✓ experiments | — | — | — | ✓ |
| PII redaction / content filters | — | ✓ strong | — | Enterprise tier | — | — | — |
| First-party CLI | ✓ ssimplifi-cli |
— | — | ✓ litellm proxy |
— | Cloudflare wrangler | — |
| First-party MCP server | ✓ ssimplifi-prism-mcp |
— | — | — | — | — | — |
| Open-source self-host | — | Partial (some components) | ✓ proxy code | ✓ MIT | — | n/a (Cloudflare-native) | ✓ Langfuse OSS |
| INR billing rail (Razorpay) | ✓ | — | — | — | — | — | — |
| Free tier (managed) | ✓ 50K tokens/day | ✓ 10K logs/mo | ✓ usage limits | n/a (self-host OSS) | — (credits required) | ✓ free on all plans | ✓ Hobby (Langfuse) |
| Entry paid tier | $19/mo Pro | $49/mo Production | Paid scaling | Enterprise custom | Pay-per-token credits | Free | $29/mo Core (Langfuse), $39/seat (LangSmith) |
| SOC 2 / SSO certifications | — (audit 2026 H2) | ✓ enterprise | ✓ Helicone-cloud | ✓ Enterprise | — | ✓ Cloudflare native | ✓ Langfuse Pro+ |
The matrix above is what an engineering team uses to narrow down to 2-3 candidates; the deeper choice happens via dedicated comparison pages.
How to actually pick
The matrix surface decisions; the real question is which 2-3 products to evaluate seriously. Here's the decision framework, ordered by which question dominates:
"I want the bill to be smaller"
Top picks: Prism, then LiteLLM (self-hosted), then Cloudflare AI Gateway.
Prism leads on this dimension by design — 3-layer caching, edge replication, provider-native passthrough as customer-passed savings, public savings counter. LiteLLM self-hosted with caching configured can replicate much of this with engineering work. Cloudflare AI Gateway has caching but doesn't lead with savings as the wedge.
Skip: Portkey (governance, not cost), Helicone (observability, not cost), OpenRouter (marketplace, not engineering), Langfuse/LangSmith (different layer).
Full Prism vs Portkey · Prism vs LiteLLM · Prism vs Cloudflare AI Gateway
"I need governance, audit, compliance"
Top picks: Portkey, then Prism (Team tier), then LiteLLM Enterprise.
Portkey leads on this dimension — PII redaction, content filters, RBAC, prompt management with versioning, multi-team controls. Prism Team adds per-project budget caps + denied models + audit log but lacks PII redaction. LiteLLM Enterprise covers the compliance surface in their paid tier.
If SOC 2 / SSO / SAML certifications are required today, Portkey or Cloudflare are the answer; Prism's SOC 2 audit is 2026 H2 roadmap.
"I want deep observability + agent debugging"
Top picks: LangSmith, then Langfuse (open-source alternative), then Helicone.
LangSmith and Langfuse are purpose-built for observability and evaluation — span-level tracing, dataset experiments, LLM-as-judge scoring, prompt-version A/B testing. Helicone is observability-first as a gateway. Combine with a gateway (Prism, Portkey, LiteLLM) for the inline concerns.
Prism vs Helicone · Prism vs Langfuse · Prism vs LangSmith
"I need maximum model breadth"
Top picks: OpenRouter (400+ models), then LiteLLM (100+ models).
OpenRouter's marketplace model is purpose-built for breadth + unified credit billing. LiteLLM's open-source catalog covers most of the same surface with self-host operation. Prism's curated 23-model catalog covers the most-used models but won't suit teams calling exotic providers.
Prism vs OpenRouter · Prism vs LiteLLM
"I want free / self-host"
Top picks: LiteLLM (MIT, self-host), then Cloudflare AI Gateway (free on Cloudflare), then Helicone (self-hostable proxy).
LiteLLM is the canonical OSS choice — MIT-licensed, well-documented, large community. Cloudflare AI Gateway is free with your existing Cloudflare account, no separate signup. Helicone publishes proxy code on GitHub for self-hosting.
Prism vs LiteLLM · Prism vs Cloudflare AI Gateway
"I'm on the Indian market"
Top picks: Prism (INR billing via Razorpay), then any USD-only competitor.
Prism is the only gateway in this comparison with native INR billing for Indian-resident customers (₹1,500 Pro / ₹3,900 Team via Razorpay). All other gateways are USD-only, which creates real friction under RBI/FEMA rules. If you're paying with an Indian card, Prism removes the cross-border conversion step.
What this comparison doesn't cover
A few categories of AI infrastructure that aren't in this guide because they live at different layers:
- Direct provider APIs. Calling OpenAI / Anthropic / Google directly with no gateway is always an option. The case for any gateway is the centralised concerns above; if you don't need them, direct calls work fine.
- Inference infrastructure (vLLM, TGI, Ollama). These are self-hosted runtimes for serving open-weights models. Not gateways; the gateways above can proxy in front of them.
- Embedding-specific services. Cohere, Voyage, Jina. Most gateways focus on chat completions; embedding-only workloads have their own ecosystem.
Migration paths between gateways
Most gateways are OpenAI-compatible, so switching is structurally a base-URL + API-key change. The friction lives in:
- Gateway-specific extensions (custom headers for routing, observability, caching) don't port cleanly. Code that depends on
X-Prism-Modeorx-portkey-virtual-keyneeds adjustment. - Prompt-management content locked in a vendor's prompt library has to be migrated manually.
- Dashboard-side configuration (budget rules, policy, alerts) has to be re-created in the destination platform.
- Billing relationships (managed-billing balance vs marketplace credits vs OSS self-host) shift with the gateway choice — plan for it.
A typical migration between two managed gateways is 1-2 days of integration + 1 week of soak-testing. Between a self-hosted gateway and a managed one (either direction) it's longer because the infrastructure-operating discipline transfers either onto or off of your team.
Decision framework
If you're actively picking right now:
- Identify the dominant concern. Cost? Governance? Observability? Breadth? Free/self-host? Geography? The matrix above has the answer.
- Narrow to 2-3 candidates. Don't evaluate all seven seriously; pick the top picks from your dominant concern's row.
- Run a 1-week pilot on each. Real traffic, real workloads, real numbers. Vendor pitches don't survive contact with your actual prompt distribution.
- Measure: cost reduction, p99 latency, time-to-debug-an-incident. These three are the production signals that matter.
- Pick the one that fits your team's engineering style. Self-host suits some teams; managed suits others. The "best" product is the one your team will actually operate well.
The AI gateway space is healthy in 2026 — multiple credible products with different wedges. The decision rotates on what you optimise for, not on which product is "best."
Where to go next
For per-pairing depth, the dedicated comparison pages cover migration code, pricing math, and per-claim verification:
- Prism vs Portkey
- Prism vs Helicone
- Prism vs LiteLLM
- Prism vs OpenRouter
- Prism vs Cloudflare AI Gateway
- Prism vs Langfuse
- Prism vs LangSmith
For the foundational pillars: AI API caching (the cost-engineering wedge), LLM budget governance (the FinOps surface), multi-region LLM API (the edge story).
For modelling your own workload: savings calculator.
Frequently asked questions
Which AI gateway is best in 2026?
There isn't one. The market has split into seven serious products, each leading with a different wedge. Pick based on your dominant concern: Prism for cost engineering, Portkey for governance, Helicone for observability, LiteLLM for OSS + breadth, OpenRouter for marketplace breadth, Cloudflare AI Gateway for free + edge-native, Langfuse/LangSmith for observability at an adjacent layer.
Can I run multiple gateways simultaneously?
Yes. The common combinations: a primary gateway (Prism, Portkey, LiteLLM) for production traffic + Langfuse or LangSmith in parallel for agent observability. Or different gateways for different projects depending on each workload's dominant concern. The two-layer pattern (inline gateway + parallel observability platform) is the canonical mature setup.
How does the pricing actually compare at scale?
Workload-dependent. At $1-5K/month LLM spend: Prism Pro ($19) and similar managed entry tiers are typically cheaper than the engineering hours to operate LiteLLM well. At $5-20K/month: the choice depends on cache-hit-rate, team size, governance needs. At $20K+/month with dedicated SRE capacity: self-host (LiteLLM) often wins on raw token cost; managed gateways win on operational simplicity + feature depth. The crossover varies per team.
Are these gateways OpenAI-compatible?
Yes, all of them expose an OpenAI-compatible Chat Completions endpoint. Switching between them is a base-URL + key change for the core API. Gateway-specific extensions (custom headers) need re-mapping during migration.
Which gateway is fastest?
Cache-hit latency: Cloudflare AI Gateway and Prism (edge-replicated) are roughly equivalent in the sub-200ms range. Centralised gateways (Portkey, Helicone, OpenRouter) pay the central-region round-trip distance. Cache-miss latency is dominated by the provider call itself and is similar across all gateways.
Are any of these going away?
The AI gateway market is well-funded and growing. The seven products covered here are all backed by serious investment (or by free Cloudflare integration in the case of CF AI Gateway). Consolidation is possible but the differentiation between wedges is real enough that none of them seem at immediate risk.
What about Vercel AI Gateway?
Vercel announced AI Gateway features but the surface in 2026 is positioned more as a developer-tooling layer (closely integrated with Next.js + the broader Vercel platform) than as a standalone AI gateway product. For Vercel-native applications it's a reasonable choice; for the broader gateway evaluation it doesn't differentiate enough yet to warrant a dedicated comparison page.
What about home-grown gateways?
Many companies build their own at scale. The case for a custom build: specific compliance, data-residency, or vendor-relationship requirements that don't fit any managed product. The case against: 6-12 months of engineering work plus ongoing operational burden, against a managed gateway that ships the same surface in days. Most teams that start home-grown end up regretting it; teams that succeed are usually well past $20K/month spend with dedicated platform engineering.
If you're narrowing your choice, the pairwise comparison pages above have the per-vendor depth + migration code samples. The AI API caching guide and LLM budget governance guide cover the foundational disciplines orthogonal to gateway choice.
Deep dives on ai gateway comparison
Five cluster posts unpack the sub-topics of this pillar. Each ships independently as part of the content calendar.