Self-supervised learning (SSL) is a Machine Learning algorithm for pretraining Deep Neural Networks (DNNs) without requiring manually labeled data. The central idea of this learning technique is based on an auxiliary stage aka pretext task in which labeled data are created automatically through data augmentation and exploited for pretraining the DNN. However, the effect of each pretext task is not well studied or compared in the literature. In this paper, we study the contribution of augmentation operators on the performance of self supervised learning algorithms in a constrained settings. We propose an evolutionary search method for optimization of data augmentation pipeline in pretext tasks and measure the impact of augmentation operators in several SOTA SSL algorithms. By encoding different combination of augmentation operators in chromosomes we seek the optimal augmentation policies through an evolutionary optimization mechanism. We further introduce methods for analyzing and explaining the performance of optimized SSL algorithms. Our results indicate that our proposed method can find solutions that outperform the accuracy of classification of SSL algorithms which confirms the influence of augmentation policy choice on the overall performance of SSL algorithms. We also compare optimal SSL solutions found by our evolutionary search mechanism and show the effect of batch size in the pretext task on two visual datasets.
翻译:自监督学习(SSL)是一种无需人工标注数据即可用于预训练深度神经网络(DNN)的机器学习算法。该学习技术的核心思想基于一个辅助阶段,即前置任务,其中通过数据增强自动生成带标签数据,并利用这些数据对DNN进行预训练。然而,现有文献中尚未充分研究或比较各前置任务的效果。本文在受限条件下研究了增强算子对自监督学习算法性能的贡献。我们提出了一种用于优化前置任务中数据增强管道的进化搜索方法,并测量了增强算子在几种最先进(SOTA)SSL算法中的影响。通过将不同增强算子的组合编码为染色体,我们借助进化优化机制寻求最优增强策略。进一步,我们引入了分析和解释优化后SSL算法性能的方法。结果表明,我们提出的方法能够找到优于SSL算法分类精度的解,这证实了增强策略选择对SSL算法整体性能的影响。我们还比较了通过进化搜索机制找到的最优SSL解,并在两个视觉数据集上展示了前置任务中批量大小的影响。