Uniformity plays an important role in evaluating learned representations, providing insights into self-supervised learning. In our quest for effective uniformity metrics, we pinpoint four principled properties that such metrics should possess. Namely, an effective uniformity metric should remain invariant to instance permutations and sample replications while accurately capturing feature redundancy and dimensional collapse. Surprisingly, we find that the uniformity metric proposed by \citet{Wang2020UnderstandingCR} fails to satisfy the majority of these properties. Specifically, their metric is sensitive to sample replications, and can not account for feature redundancy and dimensional collapse correctly. To overcome these limitations, we introduce a new uniformity metric based on the Wasserstein distance, which satisfies all the aforementioned properties. Integrating this new metric in existing self-supervised learning methods effectively mitigates dimensional collapse and consistently improves their performance on downstream tasks involving CIFAR-10 and CIFAR-100 datasets. Code is available at \url{https://github.com/statsle/WassersteinSSL}.
翻译:均匀性在评估所学表示中扮演着重要角色,为自监督学习提供深刻见解。在寻求有效均匀性度量的过程中,我们明确了此类度量应具备的四项基本原则属性。具体而言,有效的均匀性度量应在实例排列和样本复制下保持不变,同时准确捕捉特征冗余和维度坍塌。令人惊讶的是,我们发现由\citet{Wang2020UnderstandingCR}提出的均匀性度量未能满足这些属性中的大多数。具体而言,他们的度量对样本复制敏感,并且无法正确解释特征冗余和维度坍塌。为克服这些局限性,我们引入了一种基于Wasserstein距离的新均匀性度量,该度量满足上述所有属性。将这一新度量集成到现有自监督学习方法中,有效缓解了维度坍塌,并一致提升了它们在涉及CIFAR-10和CIFAR-100数据集的下游任务中的性能。代码可在\url{https://github.com/statsle/WassersteinSSL}获取。