Deep supervised learning algorithms generally require large numbers of labeled examples to achieve satisfactory performance. However, collecting and labeling too many examples can be costly and time-consuming. As a subset of unsupervised learning, self-supervised learning (SSL) aims to learn useful features from unlabeled examples without any human-annotated labels. SSL has recently attracted much attention and many related algorithms have been developed. However, there are few comprehensive studies that explain the connections and evolution of different SSL variants. In this paper, we provide a review of various SSL methods from the perspectives of algorithms, applications, three main trends, and open questions. First, the motivations of most SSL algorithms are introduced in detail, and their commonalities and differences are compared. Second, typical applications of SSL in domains such as image processing and computer vision (CV), as well as natural language processing (NLP), are discussed. Finally, the three main trends of SSL and the open research questions are discussed. A collection of useful materials is available at https://github.com/guijiejie/SSL.
翻译:深度监督学习算法通常需要大量带标签样本才能获得令人满意的性能。然而,收集和标注过多样本往往成本高昂且耗时。作为无监督学习的一个子集,自监督学习(SSL)旨在无需任何人工标注标签的情况下,从无标签样本中学习有用特征。近年来,自监督学习引起了广泛关注,并涌现出大量相关算法。然而,目前鲜有综合性研究阐明不同自监督学习变体之间的联系与演进。本文从算法、应用、三大主要趋势及开放性问题等视角,对各类自监督学习方法进行了综述。首先,详细介绍了大多数自监督学习算法的设计动机,并比较了它们的共性与差异。其次,讨论了自监督学习在图像处理与计算机视觉(CV)以及自然语言处理(NLP)等领域的典型应用。最后,探讨了自监督学习的三大主要趋势及开放性的研究问题。相关实用资料可在 https://github.com/guijiejie/SSL 获取。