Self-supervised learning (SSL) is a data-driven learning approach that utilizes the innate structure of the data to guide the learning process. In contrast to supervised learning, which depends on external labels, SSL utilizes the inherent characteristics of the data to produce its own supervisory signal. However, one frequent issue with SSL methods is representation collapse, where the model outputs a constant input-invariant feature representation. This issue hinders the potential application of SSL methods to new data modalities, as trying to avoid representation collapse wastes researchers' time and effort. This paper introduces a novel SSL algorithm for time-series data called Prediction of Functionals from Masked Latents (PFML). Instead of predicting masked input signals or their latent representations directly, PFML operates by predicting statistical functionals of the input signal corresponding to masked embeddings, given a sequence of unmasked embeddings. The algorithm is designed to avoid representation collapse, rendering it straightforwardly applicable to different time-series data domains, such as novel sensor modalities in clinical data. We demonstrate the effectiveness of PFML through complex, real-life classification tasks across three different data modalities: infant posture and movement classification from multi-sensor inertial measurement unit data, emotion recognition from speech data, and sleep stage classification from EEG data. The results show that PFML is superior to a conceptually similar pre-existing SSL method and competitive against the current state-of-the-art SSL method, while also being conceptually simpler and without suffering from representation collapse.
翻译:自监督学习(SSL)是一种数据驱动的学习方法,它利用数据的内在结构来指导学习过程。与依赖外部标签的监督学习不同,SSL利用数据的固有特征来产生自身的监督信号。然而,SSL方法的一个常见问题是表征坍塌,即模型输出一个恒定的、与输入无关的特征表示。这个问题阻碍了SSL方法在新数据模态上的潜在应用,因为试图避免表征坍塌会浪费研究人员的时间和精力。本文提出了一种用于时间序列数据的新型SSL算法,称为基于掩码潜在变量的函数预测(PFML)。PFML并非直接预测被掩码的输入信号或其潜在表示,而是在给定未掩码嵌入序列的条件下,预测与被掩码嵌入相对应的输入信号的统计函数。该算法旨在避免表征坍塌,使其能够直接适用于不同的时间序列数据领域,例如临床数据中的新型传感器模态。我们通过在三种不同数据模态上的复杂现实分类任务中证明了PFML的有效性:基于多传感器惯性测量单元数据的婴儿姿势与运动分类、基于语音数据的情感识别,以及基于脑电图数据的睡眠阶段分类。结果表明,PFML在性能上优于概念相似的现有SSL方法,并与当前最先进的SSL方法具有竞争力,同时概念上更简单,且不会出现表征坍塌问题。