Model Routing Recommender
Pick the right routing config for your latency + cost tolerance + task mix.
Your workload profile
300ms (snappy)2500ms5000ms (lenient)
Total: 100% (should sum to ~100)
simple40%
code20%
reasoning25%
complex15%
Recommended config
ModeBALANCED
Quality floor7/10
Expected p95 latency470 ms
Expected monthly cost$12
Baseline (all gpt-4o)$500
Savings vs baseline$488 (98%)
Per-task routing
- simpleLlama 3.1 8B (Groq)Groq · q=7/10300ms · $0.0001/req
- codeLlama 4 Scout (Groq)Groq · q=7/10750ms · $0.0002/req
- reasoningGemini 2.5 FlashGoogle · q=7/10500ms · $0.0001/req
- complexGemini 2.5 FlashGoogle · q=7/10500ms · $0.0001/req
Recommendations are computed from Prism's catalog of 23 models across 8 providers, weighted by your task mix. Latency p95 and per-task quality scores are illustrative defaults from v1.7-A benchmark output; actual production picks may differ slightly as the benchmark refreshes quarterly. Set X-Prism-Mode to balanced in your request headers to get this routing automatically.
Frequently asked questions
- Is Model Routing Recommender really free?
- Yes — fully free, no email gate, no signup, no sales call. All Prism tools live on this page and similar dedicated tool pages.
- What data does Model Routing Recommender use?
- The calculations use current public provider pricing (Anthropic, OpenAI, Google) and Prism's published cache-economics math from production traffic. Nothing about your usage is captured — the calculator runs entirely in your browser.