Unsupervised and self-supervised representation learning has become popular in recent years for learning useful features from unlabelled data. Representation learning has been mostly developed in the neural network literature, and other models for representation learning are surprisingly unexplored. In this work, we introduce and analyze several kernel-based representation learning approaches: Firstly, we define two kernel Self-Supervised Learning (SSL) models using contrastive loss functions and secondly, a Kernel Autoencoder (AE) model based on the idea of embedding and reconstructing data. We argue that the classical representer theorems for supervised kernel machines are not always applicable for (self-supervised) representation learning, and present new representer theorems, which show that the representations learned by our kernel models can be expressed in terms of kernel matrices. We further derive generalisation error bounds for representation learning with kernel SSL and AE, and empirically evaluate the performance of these methods in both small data regimes as well as in comparison with neural network based models.
翻译:无监督和自监督表示学习近年来在从未标记数据中学习有用特征方面变得流行。表示学习主要在神经网络文献中发展,而其他表示学习模型却出人意料地鲜有探索。本文介绍并分析了几种基于核的表示学习方法:首先,我们使用对比损失函数定义了两个核自监督学习模型;其次,提出了一种基于嵌入与重构思想的核自编码器模型。我们论证了经典的有监督核机器的表示者定理在(自监督)表示学习中并不总是适用,并提出了新的表示者定理,表明我们核模型学习到的表示可以用核矩阵表示。我们还推导了基于核的自监督学习和自编码器进行表示学习的泛化误差界,并在小数据场景以及基于神经网络模型对比中实证评估了这些方法的性能。