The proliferation of IoT and mobile devices equipped with heterogeneous sensors has enabled new applications that rely on the fusion of time-series data generated by multiple sensors with different modalities. While there are promising deep neural network architectures for multimodal fusion, their performance falls apart quickly in the presence of consecutive missing data and noise across multiple modalities/sensors, the issues that are prevalent in real-world settings. We propose Centaur, a multimodal fusion model for human activity recognition (HAR) that is robust to these data quality issues. Centaur combines a data cleaning module, which is a denoising autoencoder with convolutional layers, and a multimodal fusion module, which is a deep convolutional neural network with the self-attention mechanism to capture cross-sensor correlation. We train Centaur using a stochastic data corruption scheme and evaluate it on three datasets that contain data generated by multiple inertial measurement units. Centaur's data cleaning module outperforms 2 state-of-the-art autoencoder-based models and its multimodal fusion module outperforms 4 strong baselines. Compared to 2 related robust fusion architectures, Centaur is more robust, achieving 11.59-17.52% higher accuracy in HAR, especially in the presence of consecutive missing data in multiple sensor channels.
翻译:物联网及配备异构传感器的移动设备的普及,使得依赖多模态传感器生成的时间序列数据融合的新应用成为可能。尽管已有前景广阔的多模态融合深度神经网络架构,但当真实场景中普遍存在的多模态/传感器数据出现连续缺失和噪声时,这些模型的性能会迅速下降。本文提出Centaur——一种对人体活动识别(HAR)中数据质量缺陷具有鲁棒性的多模态融合模型。Centaur结合了数据清洗模块(基于卷积层的去噪自编码器)与多模态融合模块(采用自注意力机制捕获跨传感器相关性的深度卷积神经网络)。我们采用随机数据破坏策略训练Centaur,并在三个包含多台惯性测量单元生成数据的数据集上进行了评估。Centaur的数据清洗模块性能优于两种先进的基于自编码器的模型,其多模态融合模块亦超越四个强基线方法。与两种相关鲁棒融合架构相比,Centaur表现出更强的鲁棒性,在HAR任务中准确率提升11.59%-17.52%,尤其当多个传感器通道存在连续数据缺失时优势更为显著。