Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous work has investigated the dimensional collapse problem of SSL at a global level. In this paper, we demonstrate that representations can span over high dimensional space globally, but collapse locally. To address this, we propose a method called $\textit{local dimensionality regularization (LDReg)}$. Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each data point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The results also show that LDReg can regularize dimensionality at both local and global levels.
翻译:通过自监督学习(SSL)学到的表示可能易受维度坍缩现象的影响,即所学表示子空间的维度极低,从而无法完全表征完整的数据分布与模态。维度坍缩(亦称“欠填充”现象)是导致下游任务性能下降的主要原因之一。已有研究从全局层面探讨了SSL的维度坍缩问题。本文证明,表示在全局上可覆盖高维空间,但在局部会发生坍缩。为解决这一问题,我们提出一种名为$\textit{局部维度正则化(LDReg)}$的方法。该方法的理论基础源于Fisher-Rao度量的推导,用于在渐近小半径范围内比较并优化每个数据点的局部距离分布。通过提升局部本征维度,一系列实验表明LDReg能够改善SSL的表示质量。结果还显示,LDReg可在局部与全局层面对维度进行正则化。