Self-supervised learning methods prevent embedding collapse via modeling heuristics or explicit regularization of the embedding space. Among the latter, VICReg decomposes regularization into variance and covariance objectives, offering flexibility and interpretability. However, covariance captures only second-order statistics -- encouraging decorrelation but failing to enforce the full distributional shape needed for stable training. Sketching-based methods such as SIGReg address this by aligning embeddings to an isotropic Gaussian, but lack flexibility and suffer from vanishing gradients under collapse. We propose Variance-Invariance-Sketching Regularization (VISReg), which replaces covariance with a Sliced-Wasserstein-based sketching objective that enforces full distributional shape, while retaining a variance term for scale control. By decoupling scale and shape, VISReg combines VICReg's flexibility with the distributional rigor of sketching methods, providing robust gradients even under collapse. We show that VISReg scales linearly, outperforms existing regularization on low-quality datasets, and is resilient to long-tailed and low-rank regimes. Pre-trained on ImageNet-1K, VISReg achieves state-of-the-art performance on out-of-distribution datasets. Pre-trained on ImageNet-22K, it matches DINOv2's OOD performance despite the latter using 10x more data (LVD-142M). Project and code: https://haiyuwu.github.io/visreg.
翻译:自监督学习方法通过建模启发式或对嵌入空间进行显式正则化来防止嵌入崩溃。在后者中,VICReg将正则化分解为方差和协方差目标,兼具灵活性与可解释性。然而,协方差仅捕获二阶统计量——虽能促进去相关,却无法确保稳定训练所需的完整分布形态。基于草图的方法(如SIGReg)通过将嵌入对齐到各向同性高斯分布来解决此问题,但缺乏灵活性且在崩溃时出现梯度消失。我们提出方差-不变性-草图正则化(VISReg),该方法用基于Sliced-Wasserstein的草图目标替代协方差,以强制完整分布形态,同时保留方差项用于尺度控制。通过解耦尺度与形态,VISReg融合了VICReg的灵活性与草图方法的分布严谨性,即使在崩溃情形下也能提供稳健的梯度。实验表明,VISReg具有线性可扩展性,在低质量数据集上优于现有正则化方法,并对长尾和低秩场景具有鲁棒性。在ImageNet-1K上预训练后,VISReg在分布外数据集上达到最优性能;在ImageNet-22K上预训练时,其OOD性能与使用10倍数据(LVD-142M)的DINOv2相当。项目及代码:https://haiyuwu.github.io/visreg。