Question 1

Can I run only exact-match and skip semantic?

Accepted Answer

Yes, and it's a reasonable starting point. Exact-match is cheap, correct, and catches the deterministic-traffic slice. Adding semantic later (when paraphrasable intent becomes a real workload) is straightforward. The opposite — running only semantic and skipping exact-match — is rarely the right call, because exact-match is free of the false-positive risk semantic carries.

Question 2

What's the false-positive rate on semantic caching at threshold 0.95?

Accepted Answer

Typically 1-3% on production workloads with broad intent diversity, lower on narrow-domain workloads (e.g. a chatbot for one product's documentation). Higher than that — 5%+ — usually means the threshold is too low for the workload or the embedding model isn't a good fit for the domain. Re-validate quarterly via sampled human judgment.

Question 3

Which layer matters more for cost reduction?

Accepted Answer

Semantic typically catches more traffic, so it dominates in raw savings. Exact-match has higher value-per-hit because it's correct by definition (no false-positive risk). On most workloads, semantic delivers the larger absolute savings; exact-match delivers more reliable savings. Production deployments run both because the value of stacking them exceeds either alone.

Question 4

Does Prism run both?

Accepted Answer

Yes — exact-match in Redis with SHA-256 fingerprints, semantic in Upstash Vector with BGE-small embeddings at cosine threshold 0.95, plus a third layer (provider-native passthrough) that captures Anthropic + OpenAI prompt-cache discounts. All three concurrently, by default, on every paid request.

Exact vs semantic cache

The 60-second answer

How exact-match works

How semantic-match works

When each wins

Combined effect

See your savings before you sign up

Frequently asked questions

Related reading

All glossary terms

Read the guides