Self-supervised learning is popular method because of its ability to learn features in images without using its labels and is able to overcome limited labeled datasets used in supervised learning. Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task. There are some examples of pretext tasks used in self-supervised learning in the field of image recognition, namely rotation prediction, solving jigsaw puzzles, and predicting relative positions on image. Previous studies have only used one type of transformation as a pretext task. This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks. Therefore, we propose the Gated Self-Supervised Learning method to improve image classification which use more than one transformation as pretext task and uses the Mixture of Expert architecture as a gating network in combining each pretext task so that the model automatically can study and focus more on the most useful augmentations for classification. We test performance of the proposed method in several scenarios, namely CIFAR imbalance dataset classification, adversarial perturbations, Tiny-Imagenet dataset classification, and semi-supervised learning. Moreover, there are Grad-CAM and T-SNE analysis that are used to see the proposed method for identifying important features that influence image classification and representing data for each class and separating different classes properly. Our code is in https://github.com/aristorenaldo/G-SSL
翻译:自监督学习因其无需使用标签即可学习图像特征,且能克服监督学习中标注数据集有限的问题而广受欢迎。自监督学习通过预文本任务(pretext task)实现,该任务在模型应用于具体任务前对其进行训练。在图像识别领域,自监督学习中使用的预文本任务示例包括:旋转预测、拼图求解及图像相对位置预测。以往研究仅采用单一变换类型作为预文本任务,这引发了一个问题:若使用多个预文本任务并通过门控网络整合,将产生何种影响?为此,我们提出门控自监督学习方法(Gated Self-Supervised Learning),该方法采用多种变换作为预文本任务,并利用专家混合架构(Mixture of Experts)作为门控网络来整合各预文本任务,使模型能自动学习并聚焦于对分类最有效的增强策略。我们在多个场景中测试了所提方法的性能,包括CIFAR不平衡数据集分类、对抗扰动、Tiny-ImageNet数据集分类及半监督学习。此外,通过Grad-CAM和t-SNE分析,我们验证了该方法在识别影响图像分类的关键特征、展示各类别数据分布及有效区分不同类别方面的能力。相关代码请见:https://github.com/aristorenaldo/G-SSL