The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the development of Mobile Edge General Intelligence (MEGI), which brings real-time, privacy-preserving reasoning to the network edge. However, deploying LLM-based agentic AI reasoning in MEGI environments poses significant challenges due to the high computational demands of reasoning and the limited resources of edge devices. To address these challenges, we propose a joint optimization framework for efficient LLM reasoning deployment in MEGI. First, we systematically review enhancement methods to identify mechanisms suitable for edge adaptation. Subsequently, we present a distributed framework that synergizes reasoning enhancement via adaptive CoT prompting with scalable deployment through a distributed MoE architecture. An important innovation of this approach involves modeling reasoning depth as a dynamic network resource variable, which is optimized jointly with expert activation and transmission power. This mechanism allows the system to dynamically regulate expert networks and reasoning complexity according to task requirements and device capabilities. Experimental evaluations in mobile edge environments demonstrate that the proposed framework effectively balances reasoning quality and resource efficiency. The results show that with less than one second of additional inference time, both accuracy and latency satisfaction rate can reach 90\%, validating the practical viability of deploying sophisticated LLM reasoning in resource-constrained MEGI systems.
翻译:大型语言模型(LLM)的快速发展催生了具备强大推理与自主决策能力的智能体人工智能(AI)。其与边缘计算的结合推动了移动边缘通用智能(MEGI)的发展,为网络边缘带来了实时、保护隐私的推理能力。然而,在MEGI环境中部署基于LLM的智能体AI推理面临重大挑战,这主要源于推理的高计算需求与边缘设备的有限资源。为应对这些挑战,我们提出了一个面向MEGI的高效LLM推理部署联合优化框架。首先,我们系统性地回顾了增强方法,以识别适合边缘适配的机制。随后,我们提出了一个分布式框架,该框架通过自适应思维链(CoT)提示实现推理增强,并通过分布式混合专家(MoE)架构实现可扩展部署,两者协同作用。该方法的一个重要创新在于将推理深度建模为一个动态的网络资源变量,并与专家激活及传输功率进行联合优化。该机制使得系统能够根据任务需求和设备能力,动态地调节专家网络和推理复杂度。在移动边缘环境中的实验评估表明,所提框架能有效平衡推理质量与资源效率。结果显示,在额外推理时间增加不到一秒的情况下,准确率和延迟满意度均可达到90\%,验证了在资源受限的MEGI系统中部署复杂LLM推理的实际可行性。