Last updated:

OpenAI-compatible endpoint

An API endpoint that speaks the OpenAI Chat Completions wire protocol, so any OpenAI SDK works against it without code changes.

What it means

An OpenAI-compatible endpointis an HTTP API that implements the same request and response shape as OpenAI's Chat Completions API (POST /v1/chat/completions). Any client that works with OpenAI — official SDKs (openai-python, openai-node), third-party libraries, custom HTTP code — works with the compatible endpoint by changing only the base URL and API key. The endpoint may be hosted by a competitor (Anthropic, Google, Mistral, Cohere), an aggregator gateway (Prism, Portkey, Helicone, LiteLLM, OpenRouter, Cloudflare AI Gateway), or a self-hosted runtime (vLLM, Text Generation Inference, Ollama).

The compatibility surface is conventional — there's no formal specification published by OpenAI. Each provider implements the parts of the API that matter for their use case and skips or extends others. Most providers implement chat completions, streaming, function calling, and JSON mode. Embeddings, fine-tuning, and the Assistants API are commonly skipped.

Why this won the substrate war

By mid-2026 OpenAI-compatible became the de facto interoperability layer for LLM APIs — the analog of "S3-compatible" for object storage. The reason is straightforward: OpenAI's SDK was the first widely-adopted client, the developer mindshare landed there, and every provider that wanted to be a credible alternative had to support callers who'd written their code against the OpenAI SDK already. The substrate consolidated around the protocol rather than around the leading provider.

The practical consequence: switching providers in a production application is typically a 5-minute change — base URL, API key, possibly a model-name swap. The lock-in that used to define cloud services doesn't exist at the AI API layer the way it does at the database or compute layer.

What's actually compatible

The common-denominator surface that essentially every OpenAI-compatible endpoint implements:

  • POST /v1/chat/completions with the standard messages array, model, temperature, top_p, max_tokens, stop, stream parameters
  • Streaming via Server-Sent Events with the standard data: prefix and chunked response format
  • Function calling / tool use with the tools + tool_choice request fields
  • JSON mode via response_format
  • Standard error shapes (HTTP status codes; structured error body with type + message)

What's sometimes compatible: POST /v1/embeddings (some gateways implement; many alternative providers skip), POST /v1/completions(the older legacy endpoint; mostly deprecated). What's almost never compatible across non-OpenAI products: Assistants API, fine-tuning endpoints, audio transcription. If your code uses these, factor compatibility into the comparison.

Where compatibility breaks

The two patterns that bite teams adopting an OpenAI-compatible alternative:

Model names."gpt-4o" works against OpenAI; against a gateway that proxies Anthropic, you'd use "claude-sonnet" or "claude-opus" instead. Some gateways accept OpenAI model names and map them to their internal equivalents; others require the gateway-native names. Verify before migration.

Extended request parameters.Gateways like Prism extend the protocol via HTTP headers (X-Prism-Mode for routing intent, X-Prism-Tags for attribution, X-Prism-Cache-TTL for cache control). These aren't part of the OpenAI protocol, so they don't conflict with standard usage — but they're also not portable across gateways. Code that depends on a specific gateway's extensions takes more work to migrate.

See your savings before you sign up

Run our calculator on your own workload. Real provider rates, real cache math, no email gate.

Frequently asked questions

Is there a formal OpenAI-compatible spec?
No formal spec published by OpenAI. The conventional surface is documented across the broad ecosystem: openai.com's API reference is the de facto reference; alternative providers publish their own subset documentation. Implementations vary in completeness — chat completions is universal; embeddings is common; fine-tuning is rare.
Why doesn't Anthropic just publish an OpenAI-compatible endpoint directly?
Anthropic publishes their own native API (https://api.anthropic.com) which is structurally similar but not byte-compatible — different message format (system as a top-level field, not a message; content blocks instead of strings in some cases), different streaming envelope, different error shapes. Gateways like Prism handle the translation. As of 2026 Anthropic offers an OpenAI-compatibility shim for some endpoints, but the canonical Anthropic SDK is what most code uses.
Does OpenAI-compatible cover streaming and function calling?
Yes, on essentially every implementation. Streaming uses the same Server-Sent Events format with `data:` prefix and `[DONE]` terminator. Function calling uses the `tools` and `tool_choice` request fields with `tool_calls` in the response. Both work across OpenAI's own endpoint and the major gateways and alternative providers.
How does this affect vendor lock-in?
Dramatically reduced compared to other cloud services. Switching from one OpenAI-compatible endpoint to another is typically a base-URL + key change. Lock-in lives in gateway-specific extensions (custom headers for routing, observability, caching) rather than in the core protocol — and most gateways have rough functional equivalents.