Many experimental time series measurements share unobserved causal drivers. Examples include genes targeted by transcription factors, ocean flows influenced by large-scale atmospheric currents, and motor circuits steered by descending neurons. Reliably inferring this unseen driving force is necessary to understand the intermittent nature of top-down control schemes in diverse biological and engineered systems. Here, we introduce a new unsupervised learning algorithm that uses recurrences in time series measurements to gradually reconstruct an unobserved driving signal. Drawing on the mathematical theory of skew-product dynamical systems, we identify recurrence events shared across response time series, which implicitly define a recurrence graph with glass-like structure. As the amount or quality of observed data improves, this recurrence graph undergoes a percolation transition manifesting as weak ergodicity breaking for random walks on the induced landscape -- revealing the shared driver's dynamics, even in the presence of strongly corrupted or noisy measurements. Across several thousand random dynamical systems, we empirically quantify the dependence of reconstruction accuracy on the rate of information transfer from a chaotic driver to the response systems, and we find that effective reconstruction proceeds through gradual approximation of the driver's dominant orbit topology. Through extensive benchmarks against classical and neural-network-based signal processing techniques, we demonstrate our method's strong ability to extract causal driving signals from diverse real-world datasets spanning ecology, genomics, fluid dynamics, and physiology.
翻译:许多实验时间序列测量数据共享未观测到的因果驱动因素。例如受转录因子调控的基因、受大尺度大气环流影响的洋流,以及由下行神经元控制的运动回路。可靠地推断这种不可见的驱动力,对于理解生物与工程系统中自上而下控制机制的间歇性本质至关重要。本文提出一种全新的无监督学习算法,通过利用时间序列测量中的递归现象逐步重建未观测的驱动信号。基于斜积动力系统的数学理论,我们识别出多个响应时间序列中共有的递归事件,这些事件隐式定义了具有类似玻璃结构特性的递归图。随着观测数据数量或质量的提升,该递归图会发生逾渗相变,表现为诱导景观上随机游走的弱遍历性破缺——即使在存在强噪声或数据严重污染的情况下,仍能揭示共享驱动信号的动力学特性。通过对数千个随机动力系统的实证研究,我们定量评估了重建精度与混沌驱动信号向响应系统信息传输速率之间的依赖关系,发现有效重建需逐步逼近驱动信号的主导轨道拓扑结构。通过与基于经典与神经网络的信号处理技术进行广泛基准测试,我们证明了该方法在从涵盖生态学、基因组学、流体动力学和生理学等多领域的真实世界数据集中提取因果驱动信号方面具有卓越能力。