The free AI gateway, reframed: bring your own key and keep the savings

Most 'free AI gateway' tiers meter your logs and stop recording at a cap. Prism's free tier is different: bring your own provider keys, get a full multi-model gateway with caching and routing, and the savings land on your own bill — $0 markup.

Search "free AI gateway" and you'll find a familiar shape: a free tier that meters your logs. You get 10,000 log lines a month, the gateway keeps proxying after that, but the recording quietly stops — and the vendor's own docs often label the tier "not suitable for production." It's a trial of the dashboard, not a free way to run AI in production.

We think free should mean something more useful: bring your own provider keys, get a real multi-model gateway on top of them, and keep the savings the gateway creates. That's what Prism's free tier is now — and this post explains the reframe, honestly, including where the limits are.

What "bring your own key" actually does here

If you already pay OpenAI, Anthropic, or Groq directly, you have API keys. Register them with Prism and one endpoint — api.ssimplifi.com/v1, OpenAI-compatible — becomes your personal multi-model gateway across 8 providers (OpenAI, Anthropic, Google, Groq, DeepSeek, Fireworks, Cerebras, Mistral). Add as many keys as you want.

On top of your keys you get the parts that are annoying to build yourself:

Intelligent routing — Prism classifies each request and sends it to the cheapest model that can handle it well, picked per request via an X-Prism-Mode: eco | balanced | sport header.
Three-layer caching — exact match (sub-10ms, byte-identical), semantic match (near-duplicate prompts), and provider-native passthrough (Anthropic prompt caching, OpenAI cached input). See AI API caching as a discipline for the full picture.
Failover, session memory, observability, and Fusion — automatic cross-provider failover, server-side conversation memory, a usage dashboard with per-feature cost attribution, and multi-model Fusion mode.

The key economic point: Prism takes no token markup on BYOK requests. Your provider bills you directly at their list price. Prism never sits in the money path for those calls.

The savings land on your bill

This is the part the logs-metered free tiers can't offer. When Prism's cache serves a response, that's a call your provider never charged you for. When routing sends a simple query to a cheaper-but-capable model, that's the price delta you keep. Because you're on your own keys, every one of those savings shows up on your own provider invoice — not as a number in someone else's dashboard.

Each response carries the receipt, too: X-Prism-Cache-Status, X-Prism-Cache-Saved-Cents, and the model that actually served the request. You can see what you saved on the call you just made.

VERIFY (founder): before promoting a headline savings figure here, replace this line with the actual blended savings (routing + 3-layer cache) measured on Prism production traffic over the last 30 days. Source: usage_logs aggregation of cache_saved_cents + provider_native_saved_cents vs. direct-provider baseline. Until verified, keep the copy qualitative ("savings land on your own bill") rather than a specific percentage.

Why this beats a logs-metered free tier

A logs cap protects the vendor's storage bill. It does nothing for your AI bill. The moment your free logs run out, you're either flying blind or upgrading — and you still haven't saved a cent on the actual model spend, which is the line item that hurts.

Prism's free + BYOK tier inverts that. There's no log-recording cliff and full caching behaviour is on from the first request, so the free tier is doing the one job you came for: cutting the bill. For a head-to-head on the gateway feature matrix and the free-tier difference, see Prism vs Portkey.

What's free, and where the limits are (honestly)

Free + BYOK is governed by a fair-use cap — currently 1,000 requests/day and 30,000/month. That comfortably covers hobby projects and serious evaluation. Production-scale workloads will cross it, and that's the moment a subscription makes sense: a subscription removes the cap (unlimited usage) and the feature set is otherwise the same. You're paying to lift the ceiling, not to unlock the gateway.

Two honest caveats:

8 of 10 providers are live for BYOK today. OpenAI, Anthropic, Google, Groq, DeepSeek, Fireworks, Cerebras, and Mistral work now. xAI and Perplexity are wired and waiting on account activation — coming soon.
No key? You still get a free tier. If you don't want to bring a key, the managed free tier gives you 50,000 input tokens/day on Prism-managed keys, no credit card.

Keys are encrypted at rest with AES-256-GCM, never logged, and never returned by the API. The security model is documented in the BYOK docs.

Start in one URL change

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ssimplifi.com/v1",   # the only line you change
    api_key="prism_sk_...",                      # your Prism key
)

resp = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize this support ticket."}],
    extra_headers={"X-Prism-Mode": "balanced"},  # eco · balanced · sport · fusion
)

Register your provider keys under Dashboard → Providers, point your existing OpenAI SDK at Prism, and the routing, caching, failover, and savings math run on top of your own keys.

A free AI gateway shouldn't be a trial that expires when the logs run out. It should be the thing that quietly makes your AI bill smaller — on your keys, on your invoice. That's the version we built.

Start free with your own key →