In multivariate time series systems, key insights can be obtained by discovering lead-lag relationships inherent in the data, which refer to the dependence between two time series shifted in time relative to one another, and which can be leveraged for the purposes of control, forecasting or clustering. We develop a clustering-driven methodology for robust detection of lead-lag relationships in lagged multi-factor models. Within our framework, the envisioned pipeline takes as input a set of time series, and creates an enlarged universe of extracted subsequence time series from each input time series, via a sliding window approach. This is then followed by an application of various clustering techniques, (such as k-means++ and spectral clustering), employing a variety of pairwise similarity measures, including nonlinear ones. Once the clusters have been extracted, lead-lag estimates across clusters are robustly aggregated to enhance the identification of the consistent relationships in the original universe. We establish connections to the multireference alignment problem for both the homogeneous and heterogeneous settings. Since multivariate time series are ubiquitous in a wide range of domains, we demonstrate that our method is not only able to robustly detect lead-lag relationships in financial markets, but can also yield insightful results when applied to an environmental data set.
翻译:在多元时间序列系统中,通过发现数据固有的领先-滞后关系可获得关键洞察——这种关系描述了两条时间序列在时间偏移下的相互依赖,并可用于控制、预测或聚类等任务。我们提出了一种基于聚类的稳健检测方法,用于识别滞后多因子模型中的领先-滞后关系。在该框架中,所设计的处理流程以时间序列集合为输入,通过滑动窗口方法从每个输入时间序列中提取子序列,从而构建扩展子序列空间。随后应用多种聚类技术(如k-means++和谱聚类),并采用包括非线性指标在内的成对相似性度量。提取聚类后,跨聚类的领先-滞后估计值被稳健聚合,以增强对原始空间中一致关系的识别能力。我们建立了该问题与同质及异质场景下多参考对齐问题的关联。鉴于多元时间序列在众多领域具有普适性,我们证明该方法不仅能稳健检测金融市场中的领先-滞后关系,还能在环境数据集中产生富有洞察力的分析结果。