Investigating the relationship, particularly the lead-lag effect, between time series is a common question across various disciplines, especially when uncovering biological process. However, analyzing time series presents several challenges. Firstly, due to technical reasons, the time points at which observations are made are not at uniform inintervals. Secondly, some lead-lag effects are transient, necessitating time-lag estimation based on a limited number of time points. Thirdly, external factors also impact these time series, requiring a similarity metric to assess the lead-lag relationship. To counter these issues, we introduce a model grounded in the Gaussian process, affording the flexibility to estimate lead-lag effects for irregular time series. In addition, our method outputs dissimilarity scores, thereby broadening its applications to include tasks such as ranking or clustering multiple pair-wise time series when considering their strength of lead-lag effects with external factors. Crucially, we offer a series of theoretical proofs to substantiate the validity of our proposed kernels and the identifiability of kernel parameters. Our model demonstrates advances in various simulations and real-world applications, particularly in the study of dynamic chromatin interactions, compared to other leading methods.
翻译:探究时间序列之间的关系,特别是领先-滞后效应,是跨学科研究的常见问题,在揭示生物过程时尤为关键。然而,分析时间序列面临若干挑战:首先,由于技术原因,观测时间点往往非均匀分布;其次,部分领先-滞后效应具有瞬态特性,需基于有限时间点进行时滞估计;第三,外部因素也会影响时间序列,需要建立相似性度量来评估领先-滞后关系。为解决这些问题,我们提出一种基于高斯过程的模型,能够灵活估计非均匀时间序列的领先-滞后效应。此外,本方法可输出相异性分数,从而扩展其应用场景——在考虑外部因素影响下,可对多组配对时间序列按领先-滞后效应强度进行排序或聚类。关键的是,我们提供了一系列理论证明,以验证所提出核函数的有效性及核参数的可辨识性。通过多种模拟实验和实际应用(特别是在动态染色质相互作用研究中)的验证,相较于其他主流方法,本模型展现出显著优势。