This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and evaluate it on RouterBench under a simulated online setting. Experimental results show that the proposed method consistently outperforms random and min-cost baselines in utility reward. Compared with the max-quality reference, our method achieves substantially lower inference cost while maintaining competitive reward. These findings suggest that NeuralUCB is a promising approach for cost-aware LLM routing, while also highlighting remaining challenges in action discrimination and exploration.
翻译:本研究探讨了将NeuralUCB用于成本感知型大语言模型(LLM)路由的方法。现有路由方法可大致分为监督式路由方法与部分反馈方法,两者在效率与自适应性方面各有取舍。我们实现了一种基于NeuralUCB的路由策略,并在模拟在线环境下通过RouterBench进行评估。实验结果表明,所提方法在效用奖励上持续优于随机基线与最小成本基线。与最高质量参照方法相比,我们的方法在保持竞争性奖励的同时显著降低了推理成本。这些发现表明NeuralUCB是解决成本感知型LLM路由问题的一种有前景的方法,同时也凸显了动作判别和探索环节中尚存的挑战。