Analyzing dynamical data often requires information of the temporal labels, but such information is unavailable in many applications. Recovery of these temporal labels, closely related to the seriation or sequencing problem, becomes crucial in the study. However, challenges arise due to the nonlinear nature of the data and the complexity of the underlying dynamical system, which may be periodic or non-periodic. Additionally, noise within the feature space complicates the theoretical analysis. Our work develops spectral algorithms that leverage manifold learning concepts to recover temporal labels from noisy data. We first construct the graph Laplacian of the data, and then employ the second (and the third) Fiedler vectors to recover temporal labels. This method can be applied to both periodic and aperiodic cases. It also does not require monotone properties on the similarity matrix, which are commonly assumed in existing spectral seriation algorithms. We develop the $\ell_{\infty}$ error of our estimators for the temporal labels and ranking, without assumptions on the eigen-gap. In numerical analysis, our method outperforms spectral seriation algorithms based on a similarity matrix. The performance of our algorithms is further demonstrated on a synthetic biomolecule data example.
翻译:分析动态数据通常需要时间标签信息,但在许多应用中此类信息并不可得。恢复这些时间标签——与排序或序列化问题密切相关——成为研究中的关键环节。然而,由于数据的非线性特性以及底层动态系统(可能具有周期性或非周期性)的复杂性,该任务面临诸多挑战。此外,特征空间中的噪声进一步增加了理论分析的难度。本研究开发了利用流形学习概念的谱算法,以从含噪数据中恢复时间标签。我们首先构建数据的图拉普拉斯矩阵,随后采用第二(及第三)费德勒向量来恢复时间标签。该方法可同时适用于周期性与非周期性情形,且无需现有谱排序算法通常要求的相似度矩阵单调性假设。我们在不依赖特征间隙假设的条件下,建立了时间标签估计量与排序结果的$\ell_{\infty}$误差界。数值分析表明,本方法优于基于相似度矩阵的谱排序算法。我们进一步通过合成生物分子数据示例验证了所提算法的性能。