It is the main purpose of this article to extend the notion of statistical depth to the case of sample paths of a Markov chain, a very popular probabilistic model to describe parsimoniously random phenomena with a temporal causality. Initially introduced to define a center-outward ordering of points in the support of a multivariate distribution, depth functions permit to generalize the notions of quantiles and (signed) ranks for observations in $\mathbb{R}^d$ with $d>1$, as well as statistical procedures based on such quantities, for (unsupervised) anomaly detection tasks in particular. In this paper, overcoming the lack of natural order on the torus composed of all possible trajectories of finite length, we develop a general theoretical framework for evaluating the depth of a Markov sample path and recovering it statistically from an estimate of its transition probability with (non-) asymptotic guarantees. We also detail its numerous applications, focusing particularly on anomaly detection, a key task in various fields involving the analysis of (supposedly) Markov time-series (\textit{e.g.} health monitoring of complex infrastructures, security). Beyond the description of the methodology promoted and the statistical analysis carried out to guarantee its validity, numerical experiments are displayed, providing strong empirical evidence of the relevance of the novel concept we introduce here to quantify the degree of abnormality of Markov path sequences of variable length.
翻译:本文的主要目的是将统计深度的概念扩展到马尔可夫链样本路径的情形。马尔可夫链是一种广泛使用的概率模型,能以简约方式描述具有时间因果性的随机现象。统计深度函数最初被引入用于定义多元分布支撑集中点的由中心向外部的排序,它允许将分位数和(有符号)秩的概念推广到$d>1$的$\mathbb{R}^d$空间中的观测值,并推广基于此类量的统计程序,尤其适用于(无监督)异常检测任务。在本文中,我们克服了由所有有限长度可能轨迹构成的环面上缺乏自然顺序的困难,建立了一个通用的理论框架,用于评估马尔可夫样本路径的深度,并从其转移概率的估计中统计地恢复该深度,同时提供(非)渐近保证。我们还详述了其众多应用,特别聚焦于异常检测——这是涉及(假定为)马尔可夫时间序列分析的各个领域(例如复杂基础设施的健康监测、安全领域)的一项关键任务。除了对所倡导的方法论及其有效性保证所进行的统计分析进行描述外,本文还展示了数值实验,为引入的这一新概念在量化可变长度马尔可夫路径序列的异常程度方面的相关性提供了强有力的经验证据。