Generalizable Low-Resource Activity Recognition with Diverse and Discriminative Representation Learning

Human activity recognition (HAR) is a time series classification task that focuses on identifying the motion patterns from human sensor readings. Adequate data is essential but a major bottleneck for training a generalizable HAR model, which assists customization and optimization of online web applications. However, it is costly in time and economy to collect large-scale labeled data in reality, i.e., the low-resource challenge. Meanwhile, data collected from different persons have distribution shifts due to different living habits, body shapes, age groups, etc. The low-resource and distribution shift challenges are detrimental to HAR when applying the trained model to new unseen subjects. In this paper, we propose a novel approach called Diverse and Discriminative representation Learning (DDLearn) for generalizable low-resource HAR. DDLearn simultaneously considers diversity and discrimination learning. With the constructed self-supervised learning task, DDLearn enlarges the data diversity and explores the latent activity properties. Then, we propose a diversity preservation module to preserve the diversity of learned features by enlarging the distribution divergence between the original and augmented domains. Meanwhile, DDLearn also enhances semantic discrimination by learning discriminative representations with supervised contrastive learning. Extensive experiments on three public HAR datasets demonstrate that our method significantly outperforms state-of-art methods by an average accuracy improvement of 9.5% under the low-resource distribution shift scenarios, while being a generic, explainable, and flexible framework. Code is available at: https://github.com/microsoft/robustlearn.

翻译：人体活动识别（HAR）是一类专注于从人体传感器读数中识别运动模式的时间序列分类任务。充足的数据对于训练具备泛化能力的HAR模型至关重要，该模型可支持在线网络应用的定制与优化，但现实场景中采集大规模标注数据在时间和经济成本上均较为高昂（即低资源挑战）。同时，由于不同个体的生活习惯、体型、年龄组等差异，所采集数据存在分布偏移。低资源与分布偏移问题将严重影响HAR模型应用于新未知对象时的性能。本文提出一种名为“多样性与判别性表示学习”（DDLearn）的新方法，用于实现泛化性强的低资源HAR。DDLearn同步考虑多样性学习与判别性学习：通过构建自监督学习任务，增强数据多样性并挖掘潜在活动属性；随后提出多样性保持模块，通过扩大原始域与增强域之间的分布差异来保持学习特征的多样性；同时，DDLearn采用监督对比学习学习判别性表示以增强语义判别能力。在三个公开HAR数据集上的大量实验表明，在低资源分布偏移场景下，本方法以平均准确率提升9.5%的幅度显著优于现有最优方法，且具有通用性、可解释性与灵活性。代码开源地址：https://github.com/microsoft/robustlearn。