The recent success of learning-based algorithms can be greatly attributed to the immense amount of annotated data used for training. Yet, many datasets lack annotations due to the high costs associated with labeling, resulting in degraded performances of deep learning methods. Self-supervised learning is frequently adopted to mitigate the reliance on massive labeled datasets since it exploits unlabeled data to learn relevant feature representations. In this work, we propose SS-StyleGAN, a self-supervised approach for image annotation and classification suitable for extremely small annotated datasets. This novel framework adds self-supervision to the StyleGAN architecture by integrating an encoder that learns the embedding to the StyleGAN latent space, which is well-known for its disentangled properties. The learned latent space enables the smart selection of representatives from the data to be labeled for improved classification performance. We show that the proposed method attains strong classification results using small labeled datasets of sizes 50 and even 10. We demonstrate the superiority of our approach for the tasks of COVID-19 and liver tumor pathology identification.
翻译:基于学习算法的最新成功很大程度上归因于训练时使用的大量标注数据。然而,由于标注成本高昂,许多数据集缺乏标注,导致深度学习方法性能下降。自监督学习常被用于减轻对大规模标注数据集的依赖,因为它能利用未标注数据学习相关特征表示。本文提出SS-StyleGAN,一种适用于极小标注数据集的自监督图像标注与分类方法。该新颖框架通过集成一个编码器来学习风格生成对抗网络潜在空间的嵌入,从而为其架构添加自监督机制,该潜在空间以其解耦特性而闻名。学习到的潜在空间能够智能地选择待标注数据中的代表样本,以提升分类性能。我们证明该方法在使用仅含50张甚至10张图像的小型标注数据集时仍能取得强大的分类结果。我们展示了该方法在COVID-19与肝脏肿瘤病理识别任务中的优越性。