Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data, often by enforcing invariance to input transformations such as rotations or blurring. Recent studies have highlighted two pivotal properties for effective representations: (i) avoiding dimensional collapse-where the learned features occupy only a low-dimensional subspace, and (ii) enhancing uniformity of the induced distribution. In this work, we introduce T-REGS, a simple regularization framework for SSL based on the length of the Minimum Spanning Tree (MST) over the learned representation. We provide theoretical analysis demonstrating that T-REGS simultaneously mitigates dimensional collapse and promotes distribution uniformity on arbitrary compact Riemannian manifolds. Several experiments on synthetic data and on classical SSL benchmarks validate the effectiveness of our approach at enhancing representation quality.
翻译:自监督学习已成为一种无需标注数据即可学习表征的强大范式,其通常通过强制模型对输入变换(如旋转或模糊)具有不变性来实现。近期研究强调了有效表征的两个关键特性:(i)避免维度坍缩——即学习到的特征仅占据一个低维子空间;(ii)提升诱导分布的均匀性。本文提出T-REGS,一种基于学习表征上最小生成树长度的简单自监督学习正则化框架。我们通过理论分析证明,在任意紧致黎曼流形上,T-REGS能够同时缓解维度坍缩并促进分布均匀性。在合成数据及经典自监督学习基准上的多项实验验证了本方法在提升表征质量方面的有效性。