Question 1

How is this different from prompt caching as Anthropic and OpenAI describe it?

Accepted Answer

Same thing, different name. "Provider-native caching" is the gateway-evaluator-side term for what providers call "prompt caching" (Anthropic) or "cached tokens" (OpenAI). The wording differs; the mechanic is the same.

Question 2

Do I need to do anything special to enable this?

Accepted Answer

On Anthropic, yes — mark portions of your prompt with cache_control: { type: 'ephemeral' }. On OpenAI, no — caching is automatic on prompts ≥1,024 tokens. The Anthropic write premium (25% above normal) makes the choice deliberate: if you don't expect repeated hits within the 5-minute window, skip the marker.

Question 3

Does the gateway have to do anything, or is this transparent?

Accepted Answer

The gateway has to pass cache-control markers through cleanly (for Anthropic) and read cached-token counts back from response usage blocks (for both). And on the billing side, the gateway has to decide whether to pass the discount to the customer or absorb it. Prism does both — markers pass through, discounts pass to the customer.

Question 4

Is this the same as semantic caching?

Accepted Answer

No — provider-native caching is server-side prefix caching that discounts the model call. Semantic caching is gateway-side response caching that avoids the model call entirely (return a cached response based on embedding similarity). They solve different problems and stack: provider-native discounts the calls that do reach the model; semantic prevents many calls from reaching the model in the first place.

Provider-native caching

How it works

The two implementations

The pass-through question

When it engages

See your savings before you sign up

Frequently asked questions

Related reading

All glossary terms

Read the guides