Processing Natural Language on Embedded Devices: How Well Do Modern Models Perform?

Voice-controlled systems are becoming ubiquitous in many IoT-specific applications such as home/industrial automation, automotive infotainment, and healthcare. While cloud-based voice services (\eg Alexa, Siri) can leverage high-performance computing servers, some use cases (\eg robotics, automotive infotainment) may require to execute the natural language processing (NLP) tasks offline, often on resource-constrained embedded devices. Large language models such as BERT and its variants are primarily developed with compute-heavy servers in mind. Despite the great performance of BERT models across various NLP tasks, their large size and numerous parameters pose substantial obstacles to offline computation on embedded systems. Lighter replacement of such language models (\eg DistilBERT and TinyBERT) often sacrifice accuracy, particularly for complex NLP tasks. Until now, it is still unclear \ca whether the state-of-the-art language models, \viz BERT and its variants are deployable on embedded systems with a limited processor, memory, and battery power and \cb if they do, what are the ``right'' set of configurations and parameters to choose for a given NLP task. This paper presents an \textit{exploratory study of modern language models} under different resource constraints and accuracy budgets to derive empirical observations about these resource/accuracy trade-offs. In particular, we study how the four most commonly used BERT-based language models (\eg BERT, RoBERTa, DistilBERT, and TinyBERT) perform on embedded systems. We tested them on a Raspberry Pi-based robotic platform with three hardware configurations and four datasets running various NLP tasks. Our findings can help designers to understand the deployability and performance of modern language models, especially those based on BERT architectures, thus saving a lot of time wasted in trial-and-error efforts.

翻译：语音控制系统正日益普及于众多物联网特定应用中，例如家庭/工业自动化、汽车信息娱乐和医疗保健。虽然基于云端的语音服务（如Alexa、Siri）可以利用高性能计算服务器，但某些用例（如机器人、汽车信息娱乐）可能需要在离线状态下执行自然语言处理任务，且往往是在资源受限的嵌入式设备上。像BERT及其变体这样的大型语言模型主要针对计算密集型服务器而开发。尽管BERT模型在各种自然语言处理任务中表现出色，但其庞大的体积和众多参数为在嵌入式系统上进行离线计算带来了巨大障碍。这些语言模型的轻量级替代品（如DistilBERT和TinyBERT）常常牺牲准确性，尤其是在处理复杂的自然语言处理任务时。至今仍不明确的是：a) 当前最先进的 BERT 及其变体等语言模型，是否能在处理器、内存和电池电量均有限的嵌入式系统上部署；b) 如果可以，对于给定的自然语言处理任务，应选择何种“正确”的配置和参数集合。本文对不同资源约束和精度预算下的现代语言模型进行了探索性研究，旨在得出关于这些资源/精度权衡的实证观察。具体而言，我们研究了四种最常用的基于BERT的语言模型（如BERT、RoBERTa、DistilBERT和TinyBERT）在嵌入式系统上的表现。我们在基于树莓派的机器人平台上，使用三种硬件配置和四个数据集，运行了各种自然语言处理任务对其进行测试。我们的发现有助于设计者理解现代语言模型（尤其是基于BERT架构的模型）的可部署性和性能，从而节省大量在反复试错中浪费的时间。