This paper explores the feasibility and performance of on-device large language model (LLM) inference on various Apple iPhone models. Amidst the rapid evolution of generative AI, on-device LLMs offer solutions to privacy, security, and connectivity challenges inherent in cloud-based models. Leveraging existing literature on running multi-billion parameter LLMs on resource-limited devices, our study examines the thermal effects and interaction speeds of a high-performing LLM across different smartphone generations. We present real-world performance results, providing insights into on-device inference capabilities.
翻译:本文探索了在不同型号苹果iPhone设备上进行大语言模型(LLM)端侧推理的可行性与性能表现。在生成式人工智能快速发展的背景下,端侧LLM为解决云端模型固有的隐私、安全及连接性问题提供了方案。基于现有关于在资源受限设备上运行数十亿参数LLM的研究,本文考察了高性能LLM在不同代际智能手机上的热效应与交互速度。我们给出了实际性能测试结果,为端侧推理能力提供了深入见解。