Fully supervised models often require large amounts of labeled training data, which tends to be costly and hard to acquire. In contrast, self-supervised representation learning reduces the amount of labeled data needed for achieving the same or even higher downstream performance. The goal is to pre-train deep neural networks on a self-supervised task such that afterwards the networks are able to extract meaningful features from raw input data. These features are then used as inputs in downstream tasks, such as image classification. Previously, autoencoders and Siamese networks such as SimSiam have been successfully employed in those tasks. Yet, challenges remain, such as matching characteristics of the features (e.g., level of detail) to the given task and data set. In this paper, we present a new self-supervised method that combines the benefits of Siamese architectures and denoising autoencoders. We show that our model, called SidAE (Siamese denoising autoencoder), outperforms two self-supervised baselines across multiple data sets, settings, and scenarios. Crucially, this includes conditions in which only a small amount of labeled data is available.
翻译:全监督模型通常需要大量标注训练数据,而这往往成本高昂且难以获取。相比之下,自监督表示学习能够减少实现相同甚至更高下游性能所需的标注数据量,其目标是通过自监督任务预训练深度神经网络,使网络能够从原始输入数据中提取有意义的特征,这些特征随后被用于图像分类等下游任务。此前,自编码器和SimSiam等连体网络已成功应用于此类任务中,但仍面临挑战,例如特征特性(如细节层次)需与目标任务和数据集相匹配。本文提出一种结合连体架构与去噪自编码器优势的新型自监督方法。实验表明,我们的模型SidAE(连体去噪自编码器)在多个数据集、设置和场景下均优于两种自监督基线方法,尤其在仅少量标注数据可用的条件下表现尤为关键。