Expanding SPHERE-JEPA: A Family of Statistical Regularizers for the Hypersphere

In Self-Supervised Learning (SSL), preventing representation collapse by explicitly enforcing a uniform distribution on the unit hypersphere has proven to be effective. However, current frameworks typically rely on sliced statistical regularizers such as SIGReg (used in LeJEPA) and SUSReg (used in SPHERE-JEPA), which approximate this continuous objective via Monte Carlo sampling along random 1D directions. This stochasticity injects projection variance into the training gradients, destabilizing optimization, and hindering convergence. In this work, we first show that analytically integrating out these random projections natively yields a deterministic Maximum Mean Discrepancy (MMD), bypassing the variance of sliced methods. Motivated by this equivalence, we formulate full-dimensional objectives for MMD, Kernel Stein Discrepancy (KSD), and Kullback-Leibler (KL) divergence directly on the sphere to enforce a uniform distribution. To prevent spatial bias, we equip these tests with rotationally invariant kernels constructed via spectral theory, systematically evaluating two canonical families: smooth exponential decay (Heat) and strict frequency cutoff (Bandlimited) filters. Empirically, removing projection-induced noise results in more stable optimization, faster convergence, and consistent improvements over stochastic sliced regularizers on ImageNet and Galaxy10. Furthermore, we reveal that the choice of the statistical test shapes the geometry of the learned latent space: MMD and KSD favor locally clustered organization suitable for object-centric domains, whereas the continuous KDE-based KL divergence promotes fine-grained instance separation, yielding the strongest results on unclustered procedural texture retrieval.

翻译：在自监督学习（SSL）中，通过明确地强制执行单位超球面上的均匀分布来防止表示崩溃已被证明是有效的。然而，当前框架通常依赖于切片统计正则化器，如SIGReg（用于LeJEPA）和SUSReg（用于SPHERE-JEPA），这些正则化器通过沿随机一维方向的蒙特卡洛采样来逼近这一连续目标。这种随机性将投影方差注入训练梯度中，从而破坏优化稳定性并阻碍收敛。在这项工作中，我们首先证明，对这些随机投影进行解析积分将自然地产生一个确定性的最大均值差异（MMD），从而避免了切片方法的方差。受此等价性启发，我们直接在球面上制定MMD、核斯坦因差异（KSD）和库尔贝克-莱布勒（KL）散度的全维度目标，以强制执行均匀分布。为防止空间偏差，我们通过谱理论构建旋转不变核来装备这些测试，系统地评估了两个典型族：平滑指数衰减（热核）和严格频率截断（带限）滤波器。实验证明，消除投影引起的噪声可带来更稳定的优化、更快的收敛，并在ImageNet和Galaxy10上始终优于随机切片正则化器。此外，我们发现统计测试的选择会塑造所学潜在空间的几何结构：MMD和KSD偏好适合目标中心领域的局部聚类组织，而基于连续KDE的KL散度促进了细粒度的实例分离，在非聚类的程序化纹理检索任务上取得了最佳结果。