As wearable-based data annotation remains, to date, a tedious, time-consuming task requiring researchers to dedicate substantial time, benchmark datasets within the field of Human Activity Recognition in lack richness and size compared to datasets available within related fields. Recently, vision foundation models such as CLIP have gained significant attention, helping the vision community advance in finding robust, generalizable feature representations. With the majority of researchers within the wearable community relying on vision modalities to overcome the limited expressiveness of wearable data and accurately label their to-be-released benchmark datasets offline, we propose a novel, clustering-based annotation pipeline to significantly reduce the amount of data that needs to be annotated by a human annotator. We show that using our approach, the annotation of centroid clips suffices to achieve average labelling accuracies close to 90% across three publicly available HAR benchmark datasets. Using the weakly annotated datasets, we further demonstrate that we can match the accuracy scores of fully-supervised deep learning classifiers across all three benchmark datasets. Code as well as supplementary figures and results are publicly downloadable via github.com/mariusbock/weak_har.
翻译:由于迄今为止基于可穿戴设备的数据标注仍是一项繁琐且耗时的任务,需要研究人员投入大量时间,人类活动识别领域的基准数据集在丰富性和规模上均不及相关领域可用的数据集。近年来,CLIP等视觉基础模型获得了广泛关注,帮助视觉社区在寻找鲁棒且可泛化的特征表示方面取得进展。鉴于可穿戴设备领域的大多数研究者依赖视觉模态来克服可穿戴数据表达能力有限的缺陷,并离线准确标注待发布的基准数据集,本文提出一种新颖的基于聚类的标注流程,可显著减少需要人工标注的数据量。实验表明,采用本方法仅需标注质心片段即可在三个公开可用的HAR基准数据集上实现接近90%的平均标注准确率。利用弱标注数据集,我们进一步证明可以在所有三个基准数据集上达到全监督深度学习分类器的准确率水平。代码及补充图表与结果可通过github.com/mariusbock/weak_har公开下载。