Time-delay embedding is a powerful technique for reconstructing the state space of nonlinear time series. However, the fidelity of reconstruction relies on the assumption that the time-delay map is an embedding, which is implicitly justified by Takens' embedding theorem but rarely scrutinised in practice. In this work, we argue that time-delay reconstruction is not always an embedding, and that the non-injectivity of the time-delay map induced by a given measurement function causes irreducible information loss, degrading downstream model performance. Our analysis reveals that this local self-overlap stems from inherent dynamical properties, governed by the competition between the dynamical and the curvature penalty, and the irreducible information loss scales with the product of the geometric separation and the probability mass. We establish a measure-theoretic framework that lifts the dynamics to the space of probability measures, where the multi-valued evolution induced by the non-injectivity is quantified by how far the $n$-step conditional kernel $K^{n}(x, \cdot)$ deviates from a Dirac mass and introduce intrinsic stochasticity $\mathcal{E}^{*}_{n}$, an almost-everywhere, data-driven certificate of deterministic closure, to quantify irreducible information loss without any prior information. We demonstrate that $\mathcal{E}^{*}_{n}$ improves reconstruction quality and downstream model performance on both synthetic and real-world nonlinear data sets.
翻译:时滞嵌入是一种重建非线性时间序列状态空间的有力技术。然而,重建的保真度依赖于时滞映射为嵌入的假设,这一假设虽由Takens嵌入定理隐式证明,但在实践中鲜少被严格检验。本文认为,时滞重建并非总是嵌入,且给定测量函数导致的时滞映射非单射性会引起不可约的信息损失,从而降低下游模型性能。我们的分析表明,这种局部自重叠源于固有的动力学特性,由动力学惩罚与曲率惩罚之间的竞争所支配,且不可约信息损失与几何分离度和概率质量的乘积成正比。我们建立了一个测度论框架,将动力学提升到概率测度空间,其中非单射性诱导的多值演化通过$n$步条件核$K^{n}(x, \cdot)$偏离狄拉克质量的程度来量化,并引入本征随机性$\mathcal{E}^{*}_{n}$——一种几乎处处成立、数据驱动的确定性闭包证书——以在无需任何先验信息的情况下量化不可约信息损失。我们证明,在合成与真实世界的非线性数据集上,$\mathcal{E}^{*}_{n}$能够提升重建质量与下游模型性能。