In the context of mobile sensing environments, various sensors on mobile devices continually generate a vast amount of data. Analyzing this ever-increasing data presents several challenges, including limited access to annotated data and a constantly changing environment. Recent advancements in self-supervised learning have been utilized as a pre-training step to enhance the performance of conventional supervised models to address the absence of labelled datasets. This research examines the impact of using a self-supervised representation learning model for time series classification tasks in which data is incrementally available. We proposed and evaluated a workflow in which a model learns to extract informative features using a corpus of unlabeled time series data and then conducts classification on labelled data using features extracted by the model. We analyzed the effect of varying the size, distribution, and source of the unlabeled data on the final classification performance across four public datasets, including various types of sensors in diverse applications.
翻译:在移动感知环境的背景下,移动设备上的各类传感器持续生成海量数据。分析这些不断增长的数据面临诸多挑战,包括标注数据获取受限以及环境持续变化。近年来,自监督学习的进展已被用作预训练步骤,以提升传统监督模型的性能,从而解决缺乏标注数据集的问题。本研究探讨了在数据增量获取的时间序列分类任务中,使用自监督表示学习模型的影响。我们提出并评估了一个工作流程:模型首先利用未标注时间序列数据语料库学习提取信息特征,随后基于模型提取的特征对标注数据进行分类。我们分析了未标注数据的规模、分布和来源变化对四个公开数据集最终分类性能的影响,这些数据集涵盖了不同应用场景中的多种传感器类型。