This paper is focused on the statistical analysis of data consisting of a collection of multiple series of probability measures that are indexed by distinct time instants and supported over a bounded interval of the real line. By modeling these time-dependent probability measures as random objects in the Wasserstein space, we propose a new auto-regressive model for the statistical analysis of multivariate distributional time series. Using the theory of iterated random function systems, results on the second order stationarity of the solution of such a model are provided. We also propose a consistent estimator for the auto-regressive coefficients of this model. Due to the simplex constraints that we impose on the model coefficients, the proposed estimator that is learned under these constraints, naturally has a sparse structure. The sparsity allows the application of the proposed model in learning a graph of temporal dependency from multivariate distributional time series. We explore the numerical performances of our estimation procedure using simulated data. To shed some light on the benefits of our approach for real data analysis, we also apply this methodology to two data sets, respectively made of observations from age distribution in different countries and those from the bike sharing network in Paris.
翻译:本文聚焦于对一类特殊数据的统计分析,这类数据由多条概率测度序列组成,这些序列由不同的时间点索引,并且支撑集为实数轴上的一个有界区间。通过将这些随时间演变的概率测度建模为Wasserstein空间中的随机对象,我们提出了一种新的自回归模型,用于分析多变量分布时间序列。利用迭代随机函数系统理论,我们给出了该模型解的二阶平稳性结果。同时,我们提出了该模型自回归系数的一致性估计量。由于我们对模型系数施加了单纯形约束,在此约束下学习得到的估计量天然具有稀疏结构。这种稀疏性使得所提模型能够应用于从多变量分布时间序列中学习时间依赖图。我们通过模拟数据探究了所提估计程序的数值性能。为了阐明本方法在真实数据分析中的优势,我们还将其应用于两个数据集:一个包含不同国家年龄分布的观测数据,另一个则来自巴黎的共享单车网络。