In this work, we observe a counterintuitive phenomenon in self-supervised learning (SSL): longer training may impair the performance of dense prediction tasks (e.g., semantic segmentation). We refer to this phenomenon as Self-supervised Dense Degradation (SDD) and demonstrate its consistent presence across sixteen state-of-the-art SSL methods with various losses, architectures, and datasets. When the model performs suboptimally on dense tasks at the end of training, measuring the performance during training becomes essential. However, evaluating dense performance effectively without annotations remains an open challenge. To tackle this issue, we introduce a Dense representation Structure Estimator (DSE), composed of a class-relevance measure and an effective dimensionality measure. The proposed DSE is both theoretically grounded and empirically validated to be closely correlated with the downstream performance. Based on this metric, we introduce a straightforward yet effective model selection strategy and a DSE-based regularization method. Experiments on sixteen SSL methods across four benchmarks confirm that model selection improves mIoU by $3.0\%$ on average with negligible computational cost. Additionally, DSE regularization consistently mitigates the effects of dense degradation. Code is available at https://github.com/EldercatSAM/SSL-Degradation.
翻译:在本研究中,我们观察到自监督学习(SSL)中一个反直觉的现象:更长的训练时间可能会损害稠密预测任务(如语义分割)的性能。我们将此现象称为自监督稠密退化(SDD),并证明其在十六种具有不同损失函数、架构和数据集的先进SSL方法中普遍存在。当模型在训练结束时对稠密任务表现欠佳时,测量训练过程中的性能变得至关重要。然而,在缺乏标注的情况下有效评估稠密性能仍是一个未解决的挑战。为解决这一问题,我们引入了一种稠密表示结构估计器(DSE),它由一个类别相关性度量与一个有效维度度量组成。所提出的DSE在理论上具有依据,并通过实证验证与下游性能高度相关。基于此度量,我们提出了一种简单而有效的模型选择策略以及一种基于DSE的正则化方法。在四个基准测试中对十六种SSL方法进行的实验证实,模型选择平均将mIoU提升了$3.0\%$,且计算成本可忽略不计。此外,DSE正则化能持续缓解稠密退化的影响。代码发布于https://github.com/EldercatSAM/SSL-Degradation。