Human activity recognition (HAR) will be an essential function of various emerging applications. However, HAR typically encounters challenges related to modality limitations and label scarcity, leading to an application gap between current solutions and real-world requirements. In this work, we propose MESEN, a multimodal-empowered unimodal sensing framework, to utilize unlabeled multimodal data available during the HAR model design phase for unimodal HAR enhancement during the deployment phase. From a study on the impact of supervised multimodal fusion on unimodal feature extraction, MESEN is designed to feature a multi-task mechanism during the multimodal-aided pre-training stage. With the proposed mechanism integrating cross-modal feature contrastive learning and multimodal pseudo-classification aligning, MESEN exploits unlabeled multimodal data to extract effective unimodal features for each modality. Subsequently, MESEN can adapt to downstream unimodal HAR with only a few labeled samples. Extensive experiments on eight public multimodal datasets demonstrate that MESEN achieves significant performance improvements over state-of-the-art baselines in enhancing unimodal HAR by exploiting multimodal data.
翻译:人类活动识别(HAR)将成为各种新兴应用的关键功能。然而,HAR通常会面临模态限制和标签稀缺的相关挑战,导致当前解决方案与实际需求之间存在应用差距。本文提出MESEN,一种多模态增强的单模态感知框架,在HAR模型设计阶段利用无标签的多模态数据,以增强部署阶段的单模态HAR性能。基于对监督多模态融合对单模态特征提取影响的研究,MESEN在多模态辅助预训练阶段设计了多任务机制。通过整合跨模态特征对比学习与多模态伪分类对齐,MESEN利用无标签多模态数据为每种模态提取有效的单模态特征。随后,MESEN仅需少量标签样本即可适应下游单模态HAR任务。在八个公开多模态数据集上的广泛实验表明,MESEN通过利用多模态数据,在增强单模态HAR方面相较于最先进的基线方法实现了显著的性能提升。