Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous work has investigated the dimensional collapse problem of SSL at a global level. In this paper, we demonstrate that representations can span over high dimensional space globally, but collapse locally. To address this, we propose a method called $\textit{local dimensionality regularization (LDReg)}$. Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each data point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The results also show that LDReg can regularize dimensionality at both local and global levels.
翻译:通过自监督学习(SSL)习得的表示容易遭受维度坍缩问题,即学习到的表示子空间维度极低,从而无法完整表征数据分布与模态。维度坍缩(亦称"欠填充"现象)是导致下游任务性能下降的主要原因之一。已有研究从全局层面探讨了SSL的维度坍缩问题。本文证明:表示在全局尺度上可分布于高维空间,但局部层面仍会发生坍缩。为此,我们提出一种名为"局部维度正则化(LDReg)"的方法。该方法的理论基础源自Fisher-Rao度量的推导,通过比较和优化每个数据点在渐近小半径范围内的局部距离分布,从而提升局部本征维度。一系列实验表明,LDReg能够提升SSL的表示质量,并能在局部与全局两个层面有效正则化维度。