Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the routing strategy introduces a new security concern that adversaries may manipulate the router to consistently select expensive high-capability models. Existing routing attacks depend on either white-box access or heuristic prompts, rendering them ineffective in real-world black-box scenarios. In this work, we propose R$^2$A, which aims to mislead black-box LLM routers to expensive models via adversarial suffix optimization. Specifically, R$^2$A deploys a hybrid ensemble surrogate router to mimic the black-box router. A suffix optimization algorithm is further adapted for the ensemble-based surrogate. Extensive experiments on multiple open-source and commercial routing systems demonstrate that {R$^2$A} significantly increases the routing rate to expensive models on queries of different distributions. Code and examples: https://github.com/thcxiker/R2A-Attack.
翻译:成本感知路由动态地将用户查询分配给不同能力的模型,以平衡性能与推理成本。然而,这种路由策略引入了新的安全问题:攻击者可能操纵路由器,使其持续选择昂贵的高能力模型。现有路由攻击要么依赖白盒访问,要么依赖启发式提示,因此在真实黑盒场景中效果不佳。本研究提出R$^2$A方法,旨在通过对抗后缀优化误导黑盒LLM路由器转向高价模型。具体而言,R$^2$A部署了一个混合集成替代模型来模拟黑盒路由器,并进一步调整了适用于集成替代模型的后缀优化算法。在多个开源和商业路由系统上的大量实验表明,R$^2$A显著提高了不同查询分布下模型转向高价模型的路由率。代码与示例见:https://github.com/thcxiker/R2A-Attack。