The Internet of Things (IoT) network integrating billions of smart physical devices embedded with sensors, software, and communication technologies is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, and audio to recognize the states of humans and physical objects. Machine learning presents a rich opportunity to automatically process IoT data at scale, enabling efficient inference for understanding human wellbeing, controlling physical devices, and interconnecting smart cities. To realize this potential, we introduce IoT-LM, an open-source large multisensory language model tailored for the IoT ecosystem. IoT-LM is enabled by two technical contributions: the first is MultiIoT, the most expansive unified IoT dataset to date, encompassing over 1.15 million samples from 12 modalities and 8 tasks prepared for multisensory pre-training and instruction-tuning. The second is a new multisensory multitask adapter layer to condition pre-trained large language models on multisensory IoT data. Not only does IoT-LM yield substantial improvements on 8 supervised IoT classification tasks, but it also demonstrates new interactive question-answering, reasoning, and dialog capabilities conditioned on IoT sensors. We release IoT-LM's data sources and new multisensory language modeling framework.
翻译:物联网网络集成了数十亿嵌入传感器、软件和通信技术的智能物理设备,已成为现代社会中至关重要且快速扩张的组成部分。物联网生态系统提供了丰富的现实世界模态数据源,如运动、热感、地理定位、成像、深度、传感器与音频,用以识别人体与物理对象的状态。机器学习为大规模自动处理物联网数据提供了重要机遇,通过高效推理实现对人体健康状态的理解、物理设备的控制及智慧城市的互联。为实现这一潜力,我们提出了IoT-LM——一个专为物联网生态系统设计的开源大型多感官语言模型。IoT-LM的实现基于两项技术贡献:其一是MultiIoT,这是迄今最全面的统一物联网数据集,包含超过115万个样本,涵盖12种模态与8类任务,专为多感官预训练与指令微调而构建;其二是新型多感官多任务适配器层,可将预训练大型语言模型适配于多感官物联网数据。IoT-LM不仅在8项有监督物联网分类任务上取得显著性能提升,更展现出基于物联网传感器的新型交互式问答、推理与对话能力。我们公开了IoT-LM的数据源及新型多感官语言建模框架。