Task-type routing
Classifying a request's task category (code, summary, chat, code-fix, etc.) and routing to the model that's best at that category.
How it works
Task-type routing is the practice of classifying an incoming LLM request into a task type (e.g. simple Q&A, code generation, reasoning, complex multi-step) and using that classification to pick the right model from a calibrated routing table. The mechanic addresses the structural fact that different models dominate different task types — small fast models handle simple tasks well, frontier models earn their price on reasoning-heavy work — and a single per-application model choice over-pays for some tasks while under-serving others.
A task classifier is typically a small fast model — a fine-tuned mini-LM, an embedding-based similarity score against a labelled corpus, or a deterministic rule set over request features. The classifier runs in 5-20ms per request, predicts a task type from a fixed taxonomy, and the router looks up the (task_type, mode) cell in a routing table to pick the model. The classifier's output isn't the answer to the user's question — it's a routing signal.
What good task taxonomy looks like
A useful task taxonomy is small (4-6 categories), distinct (each category produces different optimal model picks), and stable (the classifier can reliably distinguish them). Prism uses four:
- simple — direct Q&A, extraction, formatting, translation. Small models often suffice; the task doesn't need reasoning depth.
- code — code generation, code explanation, code review. Code-specialised models (or general models good at code) dominate this cell.
- reasoning — multi-step logical inference, math, planning. Frontier models earn their price here; smaller models often produce confidently-wrong answers.
- complex — long-context analysis, multi-document synthesis, intricate research. The most expensive cell; frontier models are typically the right choice.
A taxonomy that's too granular (10-20 categories) makes the classifier unreliable and the routing table impossible to maintain. A taxonomy that's too coarse (2 categories — "simple" and "hard") doesn't capture enough variation to drive real savings.
The routing table
The other half of the system is the routing table — a 2-dimensional grid of (task_type, mode) cells, each filled with a model pick. The mode dimension expresses the caller's cost-quality intent (eco / balanced / sport in Prism's case); the task_type dimension expresses the classifier's judgment about the request.
The table is calibrated from measured benchmark data: per-model quality and per-model cost across each task type, with picks chosen by an optimisation rule (cheapest model meeting quality floor for eco; best quality-per-cost for balanced; highest quality for sport). The benchmark gets refreshed quarterly to catch model-catalog changes — new models, deprecated models, pricing shifts.
Why this beats hand-coded rules
A common starting point is hand-coded routing: "if the request contains code blocks, use GPT-4o; otherwise use Claude Sonnet." Simple, transparent, easy to debug. The case for task-type routing shows up when the rule set grows beyond 5-10 rules and starts conflicting, or when the task distribution is too varied for hand-coded heuristics to capture cleanly. Most mature production deployments end up with task-type routing for the classifier-driven 80% of decisions plus a small set of explicit overrides for the remaining edge cases.