Geometric regularization of autoencoders via observed stochastic dynamics

Stochastic dynamical systems with slow or metastable behavior evolve, on long time scales, on an unknown low-dimensional manifold in high-dimensional ambient space. Building a reduced simulator from short-burst ambient ensembles is a long-standing problem: local-chart methods like ATLAS suffer from exponential landmark scaling and per-step reprojection, while autoencoder alternatives leave tangent-bundle geometry poorly constrained, and the errors propagate into the learned drift and diffusion. We observe that the ambient covariance~$Λ$ already encodes coordinate-invariant tangent-space information, its range spanning the tangent bundle. Using this, we construct a tangent-bundle penalty and an inverse-consistency penalty for a three-stage pipeline (chart learning, latent drift, latent diffusion) that learns a single nonlinear chart and the latent SDE. The penalties induce a function-space metric, the $ρ$-metric, strictly weaker than the Sobolev $H^1$ norm yet achieving the same chart-quality generalization rate up to logarithmic factors. For the drift, we derive an encoder-pullback target via Itô's formula on the learned encoder and prove a bias decomposition showing the standard decoder-side formula carries systematic error for any imperfect chart. Under a $W^{2,\infty}$ chart-convergence assumption, chart-level error propagates controllably to weak convergence of the ambient dynamics and to convergence of radial mean first-passage times. Experiments on four surfaces embedded in up to $201$ ambient dimensions reduce radial MFPT error by $50$--$70\%$ under rotation dynamics and achieve the lowest inter-well MFPT error on most surface--transition pairs under metastable Müller--Brown Langevin dynamics, while reducing end-to-end ambient coefficient errors by up to an order of magnitude relative to an unregularized autoencoder.

翻译：具有慢变或亚稳态行为的随机动力系统在长时间尺度上，会在高维环境空间中一个未知的低维流形上演化。从短时环境系综构建约化模拟器是一个长期存在的问题：像ATLAS这样的局部图方法面临指数级地标缩放和每步重投影问题，而自编码器替代方案对切丛几何约束不足，且误差会传播到学习的漂移和扩散项中。我们观察到环境协方差矩阵Λ已编码了坐标不变的切空间信息，其值域张成了切丛。基于此，我们为三阶段流程（图学习、潜漂移、潜扩散）构建了切丛惩罚项和逆一致性惩罚项，从而学习单个非线性图和潜SDE。这些惩罚项诱导出一个函数空间度量（ρ-度量），其严格弱于Sobolev H¹范数，但在对数因子范围内实现了相同的图质量泛化率。对于漂移项，我们通过在学习编码器上应用伊藤公式推导出编码器拉回目标，并证明了偏差分解，表明标准解码器侧公式对于任何不完美图都存在系统性误差。在W^{2,∞}图收敛假设下，图级误差可控地传播到环境动力学的弱收敛和径向平均首达时间的收敛。在嵌入高达201维环境空间的四个曲面上的实验中，旋转动力学下的径向MFPT误差降低了50%-70%，且在亚稳态Müller-Brown Langevin动力学下，大多数曲面-转移对的井间MFPT误差达到最低；同时，相对于未正则化的自编码器，端到端环境系数误差降低了一个数量级。