Optimizing the deployment of large language models (LLMs) in edge computing environments is critical for enhancing privacy and computational efficiency. Toward efficient wireless LLM inference in edge computing, this study comprehensively analyzes the impact of different splitting points in mainstream open-source LLMs. On this basis, this study introduces a framework taking inspiration from model-based reinforcement learning (MBRL) to determine the optimal splitting point across the edge and user equipment (UE). By incorporating a reward surrogate model, our approach significantly reduces the computational cost of frequent performance evaluations. Extensive simulations demonstrate that this method effectively balances inference performance and computational load under varying network conditions, providing a robust solution for LLM deployment in decentralized settings.
翻译:优化大型语言模型在边缘计算环境中的部署对于提升隐私保护与计算效率至关重要。为实现边缘计算中高效的无线LLM推理,本研究系统分析了主流开源LLM中不同切分点的影响。在此基础上,本研究提出一种借鉴基于模型的强化学习思想的框架,用于确定边缘设备与用户设备间的最优切分点。通过引入奖励代理模型,该方法显著降低了频繁性能评估的计算开销。大量仿真实验表明,该方法能在变化的网络条件下有效平衡推理性能与计算负载,为去中心化环境中的LLM部署提供了鲁棒的解决方案。