Joint-Embedding Predictive Architectures (JEPAs), a powerful class of self-supervised models, exhibit an unexplained ability to cluster time-series data by their underlying dynamical regimes. We propose a novel theoretical explanation for this phenomenon, hypothesizing that JEPA's predictive objective implicitly drives it to learn the invariant subspace of the system's Koopman operator. We prove that an idealized JEPA loss is minimized when the encoder represents the system's regime indicator functions, which are Koopman eigenfunctions. This theory was validated on synthetic data with known dynamics, demonstrating that constraining the JEPA's linear predictor to be a near-identity operator is the key inductive bias that forces the encoder to learn these invariants. We further discuss that this constraint is critical for selecting this interpretable solution from a class of mathematically equivalent but entangled optima, revealing the predictor's role in representation disentanglement. This work demystifies a key behavior of JEPAs, provides a principled connection between modern self-supervised learning and dynamical systems theory, and informs the design of more robust and interpretable time-series models.
翻译:联合嵌入预测架构(JEPAs)作为一类强大的自监督模型,展现出一种未得到解释的能力:能够依据时间序列数据背后的动态机制对其进行聚类。我们针对这一现象提出了一种新颖的理论解释,假设JEPA的预测目标会隐式地驱动其学习系统Koopman算子的不变子空间。我们证明,当编码器表示系统的机制指示函数(即Koopman特征函数)时,理想化的JEPA损失达到最小。该理论在已知动态特性的合成数据上得到了验证,结果表明,将JEPA的线性预测器约束为近似恒等算子是迫使编码器学习这些不变量的关键归纳偏置。我们进一步讨论指出,这一约束对于从一类数学上等价但纠缠的最优解中选取此可解释解至关重要,从而揭示了预测器在表征解纠缠中的作用。此项工作阐明了JEPAs的一个关键行为,为现代自监督学习与动态系统理论之间建立了原理性联系,并为设计更鲁棒、更可解释的时间序列模型提供了指导。