Last updated:

Task-type routing

Classifying a request's task category (code, summary, chat, code-fix, etc.) and routing to the model that's best at that category.

How it works

Task-type routing is the practice of classifying an incoming LLM request into a task type (e.g. simple Q&A, code generation, reasoning, complex multi-step) and using that classification to pick the right model from a calibrated routing table. The mechanic addresses the structural fact that different models dominate different task types — small fast models handle simple tasks well, frontier models earn their price on reasoning-heavy work — and a single per-application model choice over-pays for some tasks while under-serving others.

A task classifier is typically a small fast model — a fine-tuned mini-LM, an embedding-based similarity score against a labelled corpus, or a deterministic rule set over request features. The classifier runs in 5-20ms per request, predicts a task type from a fixed taxonomy, and the router looks up the (task_type, mode) cell in a routing table to pick the model. The classifier's output isn't the answer to the user's question — it's a routing signal.

What good task taxonomy looks like

A useful task taxonomy is small (4-6 categories), distinct (each category produces different optimal model picks), and stable (the classifier can reliably distinguish them). Prism uses four:

  • simple — direct Q&A, extraction, formatting, translation. Small models often suffice; the task doesn't need reasoning depth.
  • code — code generation, code explanation, code review. Code-specialised models (or general models good at code) dominate this cell.
  • reasoning — multi-step logical inference, math, planning. Frontier models earn their price here; smaller models often produce confidently-wrong answers.
  • complex — long-context analysis, multi-document synthesis, intricate research. The most expensive cell; frontier models are typically the right choice.

A taxonomy that's too granular (10-20 categories) makes the classifier unreliable and the routing table impossible to maintain. A taxonomy that's too coarse (2 categories — "simple" and "hard") doesn't capture enough variation to drive real savings.

The routing table

The other half of the system is the routing table — a 2-dimensional grid of (task_type, mode) cells, each filled with a model pick. The mode dimension expresses the caller's cost-quality intent (eco / balanced / sport in Prism's case); the task_type dimension expresses the classifier's judgment about the request.

The table is calibrated from measured benchmark data: per-model quality and per-model cost across each task type, with picks chosen by an optimisation rule (cheapest model meeting quality floor for eco; best quality-per-cost for balanced; highest quality for sport). The benchmark gets refreshed quarterly to catch model-catalog changes — new models, deprecated models, pricing shifts.

Why this beats hand-coded rules

A common starting point is hand-coded routing: "if the request contains code blocks, use GPT-4o; otherwise use Claude Sonnet." Simple, transparent, easy to debug. The case for task-type routing shows up when the rule set grows beyond 5-10 rules and starts conflicting, or when the task distribution is too varied for hand-coded heuristics to capture cleanly. Most mature production deployments end up with task-type routing for the classifier-driven 80% of decisions plus a small set of explicit overrides for the remaining edge cases.

See your savings before you sign up

Run our calculator on your own workload. Real provider rates, real cache math, no email gate.

Frequently asked questions

How accurate is the classifier?
Production classifiers running on the 4-category Prism taxonomy land around 88-93% top-1 accuracy on broad-domain traffic, with the bulk of errors being adjacent (simple/code or reasoning/complex boundaries). Adjacent errors typically cost little — the alternate model in the routing table for an adjacent category is usually a reasonable choice for either task type.
Does running the classifier on every request slow things down?
Marginally. The classifier adds 5-20ms p95 to the request path, against model calls that run 200-2000ms. The relative overhead is in the single-digit percent range, and the routing savings dominate the latency cost by 50-100x.
Can I override the classifier's choice?
Most gateways let you. Prism's X-Prism-Model-Prefer header (Pro+) pins a specific model regardless of the classifier's prediction — useful for cases where the calling code knows something the classifier doesn't (e.g. "this prompt is always for legal review, force a frontier model").
Does task-type routing work with code generation specifically?
Yes — code is one of the four standard task types in most taxonomies. The classifier learns to detect code-shaped prompts (presence of code blocks, programming-language syntax, code-related keywords); the routing table maps the code cell to a code-specialised or code-strong model. Prism uses Codestral, Mistral Medium, or Claude Sonnet for code-cell routing depending on mode and recent benchmark data.