OpenAI-compatible endpoint
An API endpoint that speaks the OpenAI Chat Completions wire protocol, so any OpenAI SDK works against it without code changes.
What it means
An OpenAI-compatible endpointis an HTTP API that implements the same request and response shape as OpenAI's Chat Completions API (POST /v1/chat/completions). Any client that works with OpenAI — official SDKs (openai-python, openai-node), third-party libraries, custom HTTP code — works with the compatible endpoint by changing only the base URL and API key. The endpoint may be hosted by a competitor (Anthropic, Google, Mistral, Cohere), an aggregator gateway (Prism, Portkey, Helicone, LiteLLM, OpenRouter, Cloudflare AI Gateway), or a self-hosted runtime (vLLM, Text Generation Inference, Ollama).
The compatibility surface is conventional — there's no formal specification published by OpenAI. Each provider implements the parts of the API that matter for their use case and skips or extends others. Most providers implement chat completions, streaming, function calling, and JSON mode. Embeddings, fine-tuning, and the Assistants API are commonly skipped.
Why this won the substrate war
By mid-2026 OpenAI-compatible became the de facto interoperability layer for LLM APIs — the analog of "S3-compatible" for object storage. The reason is straightforward: OpenAI's SDK was the first widely-adopted client, the developer mindshare landed there, and every provider that wanted to be a credible alternative had to support callers who'd written their code against the OpenAI SDK already. The substrate consolidated around the protocol rather than around the leading provider.
The practical consequence: switching providers in a production application is typically a 5-minute change — base URL, API key, possibly a model-name swap. The lock-in that used to define cloud services doesn't exist at the AI API layer the way it does at the database or compute layer.
What's actually compatible
The common-denominator surface that essentially every OpenAI-compatible endpoint implements:
POST /v1/chat/completionswith the standard messages array, model, temperature, top_p, max_tokens, stop, stream parameters- Streaming via Server-Sent Events with the standard data: prefix and chunked response format
- Function calling / tool use with the tools + tool_choice request fields
- JSON mode via response_format
- Standard error shapes (HTTP status codes; structured error body with type + message)
What's sometimes compatible: POST /v1/embeddings (some gateways implement; many alternative providers skip), POST /v1/completions(the older legacy endpoint; mostly deprecated). What's almost never compatible across non-OpenAI products: Assistants API, fine-tuning endpoints, audio transcription. If your code uses these, factor compatibility into the comparison.
Where compatibility breaks
The two patterns that bite teams adopting an OpenAI-compatible alternative:
Model names."gpt-4o" works against OpenAI; against a gateway that proxies Anthropic, you'd use "claude-sonnet" or "claude-opus" instead. Some gateways accept OpenAI model names and map them to their internal equivalents; others require the gateway-native names. Verify before migration.
Extended request parameters.Gateways like Prism extend the protocol via HTTP headers (X-Prism-Mode for routing intent, X-Prism-Tags for attribution, X-Prism-Cache-TTL for cache control). These aren't part of the OpenAI protocol, so they don't conflict with standard usage — but they're also not portable across gateways. Code that depends on a specific gateway's extensions takes more work to migrate.