The Internet of Things (IoT), the network integrating billions of smart physical devices embedded with sensors, software, and communication technologies for the purpose of connecting and exchanging data with other devices and systems, is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, video, and audio for prediction tasks involving the pose, gaze, activities, and gestures of humans as well as the touch, contact, pose, 3D of physical objects. Machine learning presents a rich opportunity to automatically process IoT data at scale, enabling efficient inference for impact in understanding human wellbeing, controlling physical devices, and interconnecting smart cities. To develop machine learning technologies for IoT, this paper proposes MultiIoT, the most expansive IoT benchmark to date, encompassing over 1.15 million samples from 12 modalities and 8 tasks. MultiIoT introduces unique challenges involving (1) learning from many sensory modalities, (2) fine-grained interactions across long temporal ranges, and (3) extreme heterogeneity due to unique structure and noise topologies in real-world sensors. We also release a set of strong modeling baselines, spanning modality and task-specific methods to multisensory and multitask models to encourage future research in multisensory representation learning for IoT.
翻译:物联网(IoT)是一个集成数十亿嵌入传感器、软件和通信技术的智能物理设备的网络,旨在实现设备与系统间的连接与数据交换,作为我们现代世界中关键且快速发展的组成部分,它提供了丰富的真实世界模态资源,如运动、热感、地理定位、成像、深度、传感器、视频和音频,用于预测涉及人体姿态、凝视、活动和手势以及物理物体的触摸、接触、姿态和三维结构的任务。机器学习为大规模自动处理物联网数据提供了宝贵机遇,能够实现高效推理,从而在理解人类福祉、控制物理设备以及互联智慧城市方面产生重要影响。为开发物联网机器学习技术,本文提出MultiIoT——迄今为止最广泛的物联网基准测试,涵盖来自12种模态和8个任务的超过115万个样本。MultiIoT引入了独特挑战,包括:(1)从多种感官模态中学习;(2)长时序范围内的细粒度交互;(3)因真实世界传感器独特结构和噪声拓扑导致的极端异质性。我们还发布了一套强大的建模基线方法,涵盖模态与任务特定方法以及多感官与多任务模型,以推动物联网多感官表示学习的未来研究。