OpenAI-compatible API

OpenAI-compatible API: the substrate eating the LLM market

Last updated:

· 12 min read

Why OpenAI-compatible became the de facto interoperability layer for LLM APIs in 2026, what it actually covers, the implementation gotchas, and how to migrate between providers without rewriting code.

By 2026, "OpenAI-compatible" became the S3-of-LLM-APIs — the de facto interoperability layer that almost every credible LLM provider and AI gateway implements. The OpenAI SDK works against Anthropic (with shims), Google Gemini, Mistral, DeepSeek, every aggregator gateway (Prism, Portkey, Helicone, LiteLLM, OpenRouter, Cloudflare AI Gateway), and most self-hosted runtimes (vLLM, Ollama, Text Generation Inference). Switching providers is a base-URL + API-key change. Vendor lock-in at the LLM API layer is materially weaker than at any comparable cloud layer. This guide is the comprehensive reference: what "OpenAI-compatible" actually covers, where compatibility breaks, how to migrate, and which gateways to consider when you want centralised control over a multi-provider OpenAI-compatible substrate.

How OpenAI-compatible became the substrate

The pattern repeats from earlier cloud-infrastructure cycles. AWS published S3 in 2006; competitors built S3-compatible APIs (Backblaze B2, Cloudflare R2, MinIO, Wasabi) until "S3-compatible" was a feature checkbox for any object store. The same arc happened with PostgreSQL wire protocol (Aurora, CockroachDB, YugabyteDB), with Redis (KeyDB, DragonflyDB), with Kafka (Redpanda, WarpStream). The market consolidates around an interoperability layer rather than around the leading vendor.

For LLM APIs, OpenAI's Chat Completions endpoint (POST /v1/chat/completions) became the substrate for three reasons:

1. Developer mindshare landed there first. OpenAI shipped GPT-3.5 and GPT-4 with a well-designed Python SDK before competitors had comparable tooling. Engineers wrote their first LLM application code against openai-python. When alternatives emerged, application code already existed against that interface.

2. The protocol was simple enough to be compatible-friendly. Chat Completions is a JSON request-response over HTTP with a tractable schema (messages array with role + content, model name, sampling parameters, streaming via SSE). Competitors didn't have to negotiate a complex binary protocol or proprietary SDK — they could implement the HTTP shape themselves.

3. Vendor incentives aligned. Anthropic, Google, and others wanted developers to be able to switch to them with minimal code change. An OpenAI-compatible endpoint reduced switching friction; reduced switching friction reduced OpenAI's lock-in and made the alternative market viable. The same logic that made S3-compatible APIs serve the AWS-alternative market.

The result by 2026: OpenAI-compatible is no longer a marketing claim; it's the assumed default. Any LLM API or gateway that doesn't support it has to justify why.

What "OpenAI-compatible" actually covers

There is no formal specification published by OpenAI defining "OpenAI-compatible." The conventional surface is documented across the ecosystem (openai.com's API reference is the de facto canonical reference). Implementations vary in completeness.

The universal surface — implemented by essentially every claimant:

  • POST /v1/chat/completions with the standard request body: model, messages (array of role/content objects), temperature, top_p, max_tokens, stop, stream
  • Streaming via Server-Sent Events with the standard data: prefix, JSON chunks, and the [DONE] terminator
  • Function calling / tool use via tools + tool_choice request fields with tool_calls in the response
  • JSON mode via response_format: { type: "json_object" } or the newer structured-output schema variant
  • Standard error shapes — HTTP status codes (400, 401, 402, 403, 429, 500, 502, 503), structured error body with type and message

The common-but-not-universal surface:

  • POST /v1/embeddings — most gateways implement; many alternative providers skip or implement only partially
  • POST /v1/completions — the older legacy text-completion endpoint. Mostly deprecated; some compatibility layers still expose it for backwards compat
  • The usage block in responses with prompt_tokens, completion_tokens, total_tokens — universally returned but some implementations skip the more recent cached_tokens field

The rarely-compatible surface:

  • Assistants API — OpenAI's threading + tooling abstraction. Almost never implemented by alternative providers
  • Fine-tuning endpoints — provider-specific; not standardised
  • Files API for batch processing — OpenAI-specific
  • Audio (Whisper, TTS) — rarely implemented by chat-completion alternatives
  • Vision-specific endpoints — when not part of chat completions

If your application uses anything in the "rarely compatible" category, factor that into the migration decision.

The most common gotchas

The five patterns that trip up teams adopting an OpenAI-compatible alternative:

1. Model names don't carry across

"gpt-5-4" works against OpenAI; against an OpenAI-compatible Anthropic endpoint, you'd use "claude-sonnet-4-7" or whatever the equivalent is. Some gateways accept OpenAI-style model names and map them to internal equivalents; many require gateway-native names. Test the model name against the destination provider before assuming it works.

2. Default parameter values differ

OpenAI defaults temperature to 1.0 if unset; Anthropic defaults to 1.0 too but the behaviour at 1.0 differs subtly between models. If your code assumes a particular model behaviour at default settings, validate after migration.

3. Tool-calling response shape varies

The OpenAI spec puts tool_calls on message.tool_calls (an array of tool-call objects with id, type, function.name, function.arguments). Alternative providers' OpenAI-compatible shims sometimes massage their native tool-call format into this shape, but quirks remain. Streaming tool calls especially: chunked delivery of partial tool-call arguments doesn't always match OpenAI's exact streaming shape.

4. Streaming error handling

Errors mid-stream are surfaced differently across providers. OpenAI sends an error chunk in the SSE stream with a structured payload; some compatibility shims close the connection without an error chunk, leaving the caller to detect a truncated stream. Defensive code should handle both: detect both data: {"error": ...} chunks and abrupt stream closure.

5. System message handling

Anthropic native API treats the system message as a top-level field, not as the first message in the array. The OpenAI-compatible shim translates this — but if you're using the Anthropic native API (not via a compatibility shim), the request shape differs. Pure OpenAI-compatible code carries across; code that calls Anthropic native doesn't.

Migration patterns

Switching between OpenAI-compatible endpoints is one of the easier migrations in cloud infrastructure. The shape is consistent across most transitions:

# Pattern 1 — Direct OpenAI to direct alternative provider
from openai import OpenAI

client = OpenAI(
    base_url="https://api.anthropic.com/v1/openai",  # via Anthropic's OpenAI-compat shim
    api_key=ANTHROPIC_API_KEY,
)
# Then change "gpt-5-4" to "claude-sonnet-4-7" in your model parameter
# Pattern 2 — Direct provider to gateway (multi-provider routing)
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ssimplifi.com/v1",  # or portkey, helicone, litellm proxy, openrouter
    api_key=PRISM_API_KEY,
    default_headers={"X-Prism-Mode": "balanced"},  # gateway-specific extension
)
# Gateway picks the model based on mode + classifier; or you can pin via X-Prism-Model-Prefer
# Pattern 3 — Between two gateways
# 1. Identify the gateway-specific extensions you depend on
# 2. Map them to the new gateway's equivalents
# 3. Update base_url + api_key
# 4. Replace gateway-specific headers
# 5. Verify model name compatibility for any pinned models

The friction in any migration lives in the gateway-specific extensions, not in the core OpenAI protocol. Migrating from Portkey's x-portkey-provider to Prism's X-Prism-Mode is a string replacement; migrating from a virtual-key-based routing model to a mode-based one is a structural change.

For deeper migration walkthroughs, see Prism vs Portkey, Prism vs LiteLLM, Prism vs OpenRouter.

Why this matters for vendor lock-in

The "S3-compatible" parallel is informative on lock-in. Once S3-compatible APIs proliferated, the cost of switching object stores dropped from "months of engineering" to "a config change." The same dynamic applies to LLM APIs in 2026.

Lock-in lives in gateway-specific extensions, not in the core protocol. If your code uses client.chat.completions.create() with stock OpenAI SDK calls, switching gateways is trivial. If your code uses X-Prism-Mode headers (or any gateway-specific routing primitive), that's where the lock-in sits — and even that lock-in is light, because most gateways have rough functional equivalents.

The real lock-in is operational. Migrating a managed-billing balance to a different gateway requires reconciling spend, re-onboarding the team, re-creating dashboard configurations. Migrating policy rules and audit history doesn't carry across cleanly. Migrating prompt-management libraries (Portkey's prompt templates, for instance) requires manual rebuild on the destination platform.

Vendor relationships matter more than code coupling. A long-running support relationship, contractual SLAs, and vendor-side commitment to your roadmap are harder to migrate than the request code. This is why most teams pick a gateway and stay for years even when the underlying protocol makes switching trivial.

How Prism implements OpenAI-compatibility

Prism is OpenAI-compatible at the request layer with two strategic extensions: routing modes (X-Prism-Mode) and feature tags (X-Prism-Tags). The core API surface:

  • POST /v1/chat/completions — standard request body; streaming; tool use; JSON mode; structured outputs
  • POST /v1/embeddings — OpenAI-compatible shape; supports multiple embedding models
  • GET /v1/public/models — list of available models with metadata
  • Standard usage block in responses with prompt_tokens, completion_tokens, total_tokens, cache_read_tokens, cache_creation_tokens (the latter two surface provider-native cache discounts)
  • Standard error shapes — 4xx for client errors, 5xx for upstream provider errors, structured error body with type and message

Prism-specific extensions (all optional — don't affect basic OpenAI-compatible usage):

  • X-Prism-Mode: eco | balanced | sport | fusion — picks the routing intent. The classifier picks the specific model based on mode + task type
  • X-Prism-Model-Prefer: <slug> — Pro+ only; pins a specific model for the request, overriding the classifier
  • X-Prism-Tags: key=value,key=value — up to 10 tags for cost attribution
  • X-Prism-Cache: on | off — opt out of caching for this request (e.g. for evaluation runs)
  • X-Prism-Cache-TTL: <seconds> — override the default cache TTL
  • X-Prism-Cache-Threshold: <0.0-1.0> — Pro+ only; override the semantic-cache cosine threshold
  • X-Prism-Session: <session-id> — opt into Prism's session-memory feature
  • X-Prism-Feedback-Id: <uuid> — returned in response; correlate with feedback POSTs

Code that uses pure OpenAI SDK without any of these headers works fine — Prism applies sensible defaults (balanced mode, caching on, default TTL).

VERIFY (founder): confirm the embedding endpoint compatibility (POST /v1/embeddings) and the exact list of supported X-Prism-* headers in the current production deployment.

Decision framework

If you're deciding what to use for an OpenAI-compatible application:

  1. Calling one provider directly with no gateway in the middle? Use the provider's native SDK (openai-python for OpenAI, anthropic-python for Anthropic). You give up multi-provider routing, but you get the most stable interface to that specific provider.
  2. Calling multiple providers? Use a gateway (Prism, Portkey, Helicone, LiteLLM, OpenRouter, Cloudflare AI Gateway). The gateway centralises routing, caching, observability, governance. See the AI gateway comparison for which gateway fits your concern.
  3. Worried about lock-in? Stick to pure OpenAI-compatible code paths — no gateway-specific extensions. Then switching gateways is a base-URL + key change. The trade is you give up the gateway-specific features.
  4. Want to use Anthropic features (cache_control, citations, etc.)? Either call Anthropic native API or use a gateway that proxies the Anthropic-specific request fields through cleanly. Most gateways pass them through; verify against the gateway's docs.

Where to go next

If you're comparing OpenAI-compatible gateways: Prism vs Portkey, Prism vs Helicone, Prism vs LiteLLM, Prism vs OpenRouter, Prism vs Cloudflare AI Gateway, and the full AI gateway comparison guide.

If you're combining OpenAI-compatibility with cost engineering: AI API caching covers the cache layers that work across any OpenAI-compatible gateway.

If you want to model migration savings: savings calculator.


Frequently asked questions

Is OpenAI's API the only standard for LLM APIs in 2026?

Effectively yes for production usage. Some specific vendor APIs (Anthropic native with cache_control, Google Gemini native with grounding) have features that don't have clean OpenAI-compatible equivalents, so those features require vendor-native calls. But the core chat-completion + streaming + function-calling surface is universally OpenAI-compatible. Other protocols (LangChain's framework, custom enterprise APIs) exist but they sit at a different abstraction layer.

Does OpenAI control the spec?

Not formally. OpenAI publishes their API reference; alternative providers and gateways implement compatibility against that reference. There's no governance body; there's no formal versioning; there's no committee. Compatibility is a convention enforced by market pressure. OpenAI could in principle break the convention with a new release, but they have strong incentives not to — too much of the ecosystem depends on the current shape.

Can I run multiple OpenAI-compatible gateways from one application?

Yes. The OpenAI SDK accepts a base_url parameter that you can configure per-client. Many production deployments instantiate multiple OpenAI clients — one for Prism (production gateway), one for OpenRouter (research/experimentation), one for a self-hosted vLLM endpoint (cost-sensitive workloads). The OpenAI-compatible surface makes this seamless.

What's the difference between "OpenAI-compatible" and "OpenAI-equivalent"?

OpenAI-compatible means the API shape matches — same endpoints, same request/response schemas, the OpenAI SDK works against it. OpenAI-equivalent would mean the behaviour matches — same model outputs for the same prompts, same tokenisation, same edge-case handling. Compatible is universal; equivalent is impossible (different models produce different outputs by design).

Does the OpenAI-compatible substrate cover the new Assistants API?

Mostly no. The Assistants API is OpenAI-specific — threads, tools, file management, etc. Alternative providers and gateways rarely implement it. If your code uses Assistants, you're more locked into OpenAI than if you use plain Chat Completions.

How does this affect the AI gateway market specifically?

It makes gateway switching easier and feature differentiation more important. Since the core protocol is universal, gateways compete on the surrounding surface — caching, observability, routing, governance, pricing — rather than on the proxy capability itself. Healthy market dynamic; aligns with the gateway-comparison framing in the comparison guide.

What about streaming tool calls?

The trickiest part of OpenAI compatibility. OpenAI streams tool-call arguments as chunked JSON deltas; alternative providers don't always match the chunking pattern exactly. Production code that handles streaming tool calls should test against each gateway/provider it depends on rather than assuming byte-identical streaming behaviour.

Can I migrate a production application without downtime?

Yes, with the standard blue-green migration pattern. Stand up the new gateway endpoint alongside the existing one; route 5% of traffic to the new endpoint with feature-flag control; monitor error rates, latency, and quality signals; gradually shift traffic to 100%; decommission the old endpoint. The OpenAI-compatibility makes this much cleaner than migrations at other layers — you don't need to rewrite application code, just shift the base URL gradually.


If you're picking which OpenAI-compatible gateway to use, the AI gateway comparison covers the seven candidates with their wedges. The OpenAI-compatible endpoint glossary entry covers the term in shorter form.

Deep dives on openai-compatible api

Five cluster posts unpack the sub-topics of this pillar. Each ships independently as part of the content calendar.

See your savings before you sign up

Run our calculator on your own workload. Real provider rates, real cache math, no email gate.

Frequently asked questions

What is openai-compatible api?
The substrate eating the LLM market — implementation, gotchas, replacement guide. Prism covers this topic from the perspective of an AI API proxy that ships measured production data on every request — not vendor estimates.
How does Prism handle openai-compatible api?
Prism is an OpenAI-compatible AI API proxy that addresses openai-compatible api directly. See the deep-dive posts in this guide for the per-sub-topic implementation details, or jump to the savings calculator to model the impact on your workload.