Self-supervised Learning (SSL) provides a strategy for constructing useful representations of images without relying on hand-assigned labels. Many such methods aim to map distinct views of the same scene or object to nearby points in the representation space, while employing some constraint to prevent representational collapse. Here we recast the problem in terms of efficient coding by adopting manifold capacity, a measure that quantifies the quality of a representation based on the number of linearly separable object manifolds it can support, as the efficiency metric to optimize. Specifically, we adapt the manifold capacity for use as an objective function in a contrastive learning framework, yielding a Maximum Manifold Capacity Representation (MMCR). We apply this method to unlabeled images, each augmented by a set of basic transformations, and find that it learns meaningful features using the standard linear evaluation protocol. Specifically, we find that MMCRs support performance on object recognition comparable to or surpassing that of recently developed SSL frameworks, while providing more robustness to adversarial attacks. Empirical analyses reveal differences between MMCRs and representations learned by other SSL frameworks, and suggest a mechanism by which manifold compression gives rise to class separability.
翻译:自监督学习(SSL)为在不依赖人工标注标签的情况下构建有效图像表征提供了一种策略。许多此类方法旨在将同一场景或对象的不同视图映射到表征空间中的邻近点,同时施加某些约束以防止表征坍缩。本文从高效编码的视角重新审视该问题,采用流形容量(一种基于表征所能支持的线性可分对象流形数量来量化表征质量的指标)作为待优化的效率度量。具体而言,我们将流形容量适配为对比学习框架中的目标函数,从而提出最大流形容量表示(MMCR)。我们将该方法应用于无标签图像(每张图像通过一组基本变换进行增强),发现其能通过标准线性评估协议学习到有意义的特征。具体而言,MMCR在目标识别任务上的性能可与近期提出的SSL框架相媲美甚至超越后者,同时提供了更强的对抗攻击鲁棒性。实证分析揭示了MMCR与其他SSL框架所学表征之间的差异,并提出了流形压缩实现类别可分性的可能机制。