The current cascading strategy routes requests to cheaper models first and escalates only on failure. This creates a structural bias where cheaper models dominate traffic and premium models are under-explored.
As a result, the bandit receives biased feedback and cannot accurately estimate the value of premium models.
The current cascading strategy routes requests to cheaper models first and escalates only on failure. This creates a structural bias where cheaper models dominate traffic and premium models are under-explored.
As a result, the bandit receives biased feedback and cannot accurately estimate the value of premium models.