The rapid development of generative AI technologies, including large language models (LLMs), has brought transformative changes to various fields. However, deploying such advanced models on mobile and edge devices remains challenging due to their high computational, memory, communication, and energy requirements. To address these challenges, we propose a model-centric framework for democratizing generative AI deployment on mobile and edge networks. First, we comprehensively review key compact model strategies, such as quantization, model pruning, and knowledge distillation, and present key performance metrics to optimize generative AI for mobile deployment. Next, we provide a focused review of mobile and edge networks, emphasizing the specific challenges and requirements of these environments. We further conduct a case study demonstrating the effectiveness of these strategies by deploying LLMs on real mobile edge devices. Experimental results highlight the practicality of democratized LLMs, with significant improvements in generalization accuracy, hallucination rate, accessibility, and resource consumption. Finally, we discuss potential research directions to further advance the deployment of generative AI in resource-constrained environments.
翻译:生成式人工智能技术(包括大型语言模型)的快速发展为各个领域带来了变革性影响。然而,由于这些先进模型对计算、内存、通信和能量要求极高,在移动和边缘设备上部署它们仍然面临挑战。为应对这些挑战,我们提出了一个以模型为中心的框架,旨在实现移动与边缘网络上生成式AI部署的民主化。首先,我们全面回顾了关键的紧凑模型策略,如量化、模型剪枝和知识蒸馏,并提出了优化移动部署生成式AI的关键性能指标。其次,我们对移动和边缘网络进行了重点评述,强调了这些环境特有的挑战与需求。我们进一步通过在实际移动边缘设备上部署LLMs的案例研究,展示了这些策略的有效性。实验结果凸显了民主化LLMs的实用性,在泛化精度、幻觉率、可访问性及资源消耗方面均有显著改善。最后,我们讨论了在资源受限环境中进一步推进生成式AI部署的潜在研究方向。