As large language models (LLMs) move from centralized clouds to mobile edge environments, efficient serving must balance latency, energy consumption, and accuracy under constrained device-edge resources. Query-level routing between lightweight on-device models and stronger edge models provides a flexible mechanism to navigate this trade-off. However, existing routers are designed for centralized cloud settings and optimize token-level costs, failing to capture the dynamic latency and energy overheads in wireless edge deployments. In this paper, we formulate mobile edge LLM routing as a deployment-constrained, cost-aware decision problem, and propose CR^2, a two-stage device-edge routing framework. CR^2 decouples a lightweight on-device margin gate from an edge-side utility selector for deferred queries. The margin gate operates on frozen query embeddings and a user-specified cost weight to predict whether local execution is utility-optimal relative to the best edge alternative under the target operating point. We further introduce a conformal risk control (CRC) calibration procedure that maps each operating point to an acceptance threshold, enabling explicit control of the marginal false-acceptance risk under the full-information utility reference. Experiments on the routing task show that CR^2 closely matches a full-information reference router using only device-side signals before deferral. Compared with strong query-level baselines, CR^2 consistently improves the deployable accuracy-cost Pareto frontier and reduces normalized deployment cost by up to 16.9% at matched accuracy.
翻译:随着大型语言模型从集中式云环境迁移至移动边缘场景,在受限的设备-边缘资源下实现高效服务需平衡延迟、能耗与准确率。轻量级端侧模型与更强边缘模型之间的查询级路由机制为权衡这些因素提供了灵活方案。然而,现有路由方案面向集中式云环境设计,仅优化词元级成本,未能捕捉无线边缘部署中的动态延迟与能量开销。本文提出一种部署约束下的成本感知决策问题——移动边缘LLM路由,并设计CR²两阶段设备-边缘路由框架。CR²将轻量级端侧边界门控与边缘侧效用选择器解耦,用于处理延迟查询。边界门控基于冻结查询嵌入与用户指定成本权重,预测在目标运行点下本地执行相较于最佳边缘替代方案的效用最优性。我们进一步引入保形风险控制校准流程,将每个运行点映射至接受阈值,从而在全信息效用基准下实现对边际误接受风险的显式控制。路由任务实验表明,CR²在延迟决策前仅利用设备侧信号即可紧密匹配全信息基准路由。与强查询级基线相比,CR²在匹配准确率条件下持续改进可部署的准确率-成本帕累托前沿,并将归一化部署成本降低达16.9%。