LLM Complexity Router

A fine-tuned DeBERTa-v3-small classifier that routes queries between gpt-4o-mini (cheap) and gpt-4o (expensive) β€” saving ~41% cost while improving response quality vs always using the expensive model.

Performance (WildBench β€” 200 real user queries)

Strategy Quality (1-10) Cost/1K % Cheap Quality Ξ” Cost Saved
always_expensive 8.11 $6.00 0% baseline baseline
length_based 8.02 $3.35 47% -0.09 +44.2%
deberta_router 8.24 $3.55 43.5% +0.13 +40.9%
routellm_mf 7.96 $3.60 42.5% -0.15 +39.9%

Only router that beats the expensive baseline on quality and saves cost.

Category Breakdown (vs always_expensive)

Category Router Baseline Ξ”
Advice seeking 9.50 9.00 +0.50
Brainstorming 8.40 8.20 +0.20
Coding & Debugging 7.73 7.89 -0.16
Creative Writing 8.00 7.74 +0.26
Data Analysis 9.20 9.00 +0.20
Editing 8.40 8.40 +0.00
Information seeking 8.11 7.83 +0.28
Math 8.33 7.83 +0.50
Planning 8.59 8.77 -0.18
Reasoning 8.82 8.61 +0.21
Role playing 7.00 6.71 +0.29

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="your-username/complexity-router"
)

result = classifier("What is the capital of France?")
# β†’ [{'label': 'SIMPLE', 'score': 0.98}]  β†’ route to gpt-4o-mini

result = classifier("Prove the Riemann hypothesis step by step")
# β†’ [{'label': 'COMPLEX', 'score': 0.95}]  β†’ route to gpt-4o

Training

  • Base model: microsoft/deberta-v3-small
  • Training data: proprietary (not released)
  • Labels: SIMPLE / COMPLEX
  • Benchmarked against: RouteLLM mf router, length-based baseline

Limitations

  • Weaker on Coding & Debugging (-0.16) and Planning (-0.18)
  • Optimized for gpt-4o vs gpt-4o-mini routing specifically
  • Training data distribution may not match all use cases
Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support