The {\em stop gradient} and {\em exponential moving average} iterative procedures are commonly used in non-contrastive approaches to self-supervised learning to avoid representation collapse, with excellent performance in downstream applications in practice. This presentation investigates these procedures from the dual viewpoints of optimization and dynamical systems. We show that, in general, although they {\em do not} optimize the original objective, or {\em any} other smooth function, they {\em do} avoid collapse Following~\citet{Tian21}, but without any of the extra assumptions used in their proofs, we then show using a dynamical system perspective that, in the linear case, minimizing the original objective function without the use of a stop gradient or exponential moving average {\em always} leads to collapse. Conversely, we characterize explicitly the equilibria of the dynamical systems associated with these two procedures in this linear setting as algebraic varieties in their parameter space, and show that they are, in general, {\em asymptotically stable}. Our theoretical findings are illustrated by empirical experiments with real and synthetic data.
翻译:在非对比自监督学习方法中,{\em 停止梯度}与{\em 指数移动平均}迭代程序被广泛用于避免表征坍塌,并在下游应用中展现出优异的实践性能。本报告从优化与动力系统的双重视角对这些程序进行了研究。我们证明,尽管这些程序通常{\em 并不}优化原始目标函数,也{\em 不}优化任何其他光滑函数,但它们确实能够避免坍塌。继~\citet{Tian21}之后,我们在无需其证明中任何额外假设的前提下,通过动力系统视角进一步证明:在线性情形下,若不使用停止梯度或指数移动平均,最小化原始目标函数{\em 必然}导致坍塌。反之,我们在线性设定中,将这两种程序所关联动力系统的平衡点显式刻画为其参数空间中的代数簇,并证明这些平衡点通常是{\em 渐近稳定}的。我们的理论发现通过真实数据与合成数据的实证实验得到了验证。