Model Routing Recommender

Pick the right routing config for your latency + cost tolerance + task mix.

Your workload profile

300ms (snappy)2500ms5000ms (lenient)

Total: 100% (should sum to ~100)

simple40%
code20%
reasoning25%
complex15%

Recommended config

ModeBALANCED
Quality floor7/10
Expected p95 latency470 ms
Expected monthly cost$12
Baseline (all gpt-4o)$500
Savings vs baseline$488 (98%)

Per-task routing

  • simpleLlama 3.1 8B (Groq)
    Groq · q=7/10300ms · $0.0001/req
  • codeLlama 4 Scout (Groq)
    Groq · q=7/10750ms · $0.0002/req
  • reasoningGemini 2.5 Flash
    Google · q=7/10500ms · $0.0001/req
  • complexGemini 2.5 Flash
    Google · q=7/10500ms · $0.0001/req

Recommendations are computed from Prism's catalog of 23 models across 8 providers, weighted by your task mix. Latency p95 and per-task quality scores are illustrative defaults from v1.7-A benchmark output; actual production picks may differ slightly as the benchmark refreshes quarterly. Set X-Prism-Mode to balanced in your request headers to get this routing automatically.

Frequently asked questions

Is Model Routing Recommender really free?
Yes — fully free, no email gate, no signup, no sales call. All Prism tools live on this page and similar dedicated tool pages.
What data does Model Routing Recommender use?
The calculations use current public provider pricing (Anthropic, OpenAI, Google) and Prism's published cache-economics math from production traffic. Nothing about your usage is captured — the calculator runs entirely in your browser.