In recent years, self-supervised learning (SSL) frameworks have been extensively applied to sensor-based Human Activity Recognition (HAR) in order to learn deep representations without data annotations. While SSL frameworks reach performance almost comparable to supervised models, studies on interpreting representations learnt by SSL models are limited. Nevertheless, modern explainability methods could help to unravel the differences between SSL and supervised representations: how they are being learnt, what properties of input data they preserve, and when SSL can be chosen over supervised training. In this paper, we aim to analyze deep representations of two recent SSL frameworks, namely SimCLR and VICReg. Specifically, the emphasis is made on (i) comparing the robustness of supervised and SSL models to corruptions in input data; (ii) explaining predictions of deep learning models using saliency maps and highlighting what input channels are mostly used for predicting various activities; (iii) exploring properties encoded in SSL and supervised representations using probing. Extensive experiments on two single-device datasets (MobiAct and UCI-HAR) have shown that self-supervised learning representations are significantly more robust to noise in unseen data compared to supervised models. In contrast, features learnt by the supervised approaches are more homogeneous across subjects and better encode the nature of activities.
翻译:近年来,自监督学习框架被广泛应用于基于传感器的人体活动识别,以在无需数据标注的情况下学习深层表示。尽管自监督学习框架的性能已接近监督模型,但关于自监督学习模型所学表示的解译研究仍十分有限。然而,现代可解释性方法有助于揭示自监督表示与监督表示之间的差异:它们如何被学习、保留了输入数据的哪些属性,以及在何种情况下应优先选择自监督学习而非监督训练。本文旨在分析两种近期自监督学习框架(SimCLR和VICReg)的深层表示。具体研究重点包括:(i)比较监督模型与自监督模型对输入数据扰动的鲁棒性;(ii)利用显著性图解释深度学习模型的预测结果,并突出预测不同活动时最常使用的输入通道;(iii)通过探针实验探究自监督表示与监督表示中编码的属性。在两个单设备数据集(MobiAct和UCI-HAR)上的大量实验表明:与监督模型相比,自监督学习表示对未见数据中的噪声具有显著更强的鲁棒性。相比之下,监督方法学习的特征在不同受试者间更均匀,且能更好编码活动的本质特征。