Investigating the relationship, particularly the lead-lag effect, between time series is a common question across various disciplines, especially when uncovering biological process. However, analyzing time series presents several challenges. Firstly, due to technical reasons, the time points at which observations are made are not at uniform inintervals. Secondly, some lead-lag effects are transient, necessitating time-lag estimation based on a limited number of time points. Thirdly, external factors also impact these time series, requiring a similarity metric to assess the lead-lag relationship. To counter these issues, we introduce a model grounded in the Gaussian process, affording the flexibility to estimate lead-lag effects for irregular time series. In addition, our method outputs dissimilarity scores, thereby broadening its applications to include tasks such as ranking or clustering multiple pair-wise time series when considering their strength of lead-lag effects with external factors. Crucially, we offer a series of theoretical proofs to substantiate the validity of our proposed kernels and the identifiability of kernel parameters. Our model demonstrates advances in various simulations and real-world applications, particularly in the study of dynamic chromatin interactions, compared to other leading methods.
翻译:探究时间序列之间的关系(特别是超前-滞后效应)是多个学科领域的常见问题,尤其在揭示生物过程时尤为重要。然而,时间序列分析面临若干挑战:首先,由于技术原因,观测时间点并非均匀间隔;其次,部分超前-滞后效应具有瞬时性,需基于有限时间点进行时滞估计;第三,外部因素同样影响时间序列,需要引入相似性度量来评估超前-滞后关系。针对这些问题,我们提出一种基于高斯过程的模型,该模型能够灵活估计非规则时间序列的超前-滞后效应。此外,我们的方法可输出相异度分数,从而扩展其应用范围,包括在考虑外部因素影响下,对多对时间序列的超前-滞后效应强度进行排序或聚类等任务。关键的是,我们提供了一系列理论证明,以验证所提出核函数的有效性及核参数的可辨识性。与其它主流方法相比,我们的模型在各类模拟实验及实际应用中(特别是动态染色质相互作用研究)展现出显著优势。