Language models have gained significant interest due to their general-purpose capabilities, which appear to emerge as models are scaled to increasingly larger parameter sizes. However, these large models impose stringent requirements on computing systems, necessitating significant memory and processing requirements for inference. This makes performing inference on mobile and edge devices challenging, often requiring invocating remotely-hosted models via network calls. Remote inference, in turn, introduces issues like latency, unreliable network connectivity, and privacy concerns. To address these challenges, we explored the possibility of deviating from the trend of increasing model size. Instead, we hypothesize that much smaller models (~30-120M parameters) can outperform their larger counterparts for specific tasks by carefully curating the data used for pre-training and fine-tuning. We investigate this within the context of deploying edge-device models to support sensing applications. We trained several foundational models through a systematic study and found that small models can run locally on edge devices, achieving high token rates and accuracy. Based on these findings, we developed a framework that allows users to train foundational models tailored to their specific applications and deploy them at the edge.
翻译:语言模型因其通用能力而受到广泛关注,这些能力似乎随着模型参数规模的扩大而涌现。然而,这些大型模型对计算系统提出了严格要求,推理过程需要大量的内存和处理资源。这使得在移动和边缘设备上执行推理具有挑战性,通常需要通过网络调用远程托管的模型。远程推理反过来又引入了延迟、网络连接不可靠和隐私问题等挑战。为解决这些问题,我们探索了偏离模型规模增长趋势的可能性。我们假设,通过精心策划用于预训练和微调的数据,更小的模型(约30-120M参数)在特定任务上可以超越更大的模型。我们在部署边缘设备模型以支持感知应用的背景下对此进行了研究。通过系统研究,我们训练了多个基础模型,发现小型模型可以在边缘设备上本地运行,实现高令牌率和准确性。基于这些发现,我们开发了一个框架,允许用户训练针对其特定应用定制的基础模型,并将其部署在边缘。