We develop statistical models for samples of distribution-valued stochastic processes through time-varying optimal transport process representations under the Wasserstein metric when the values of the process are univariate distributions. While functional data analysis provides a toolbox for the analysis of samples of real- or vector-valued processes, there is at present no coherent statistical methodology available for samples of distribution-valued processes, which are increasingly encountered in data analysis. To address the need for such methodology, we introduce a transport model for samples of distribution-valued stochastic processes that implements an intrinsic approach whereby distributions are represented by optimal transports. Substituting transports for distributions addresses the challenge of centering distribution-valued processes and leads to a useful and interpretable representation of each realized process by an overall transport and a real-valued trajectory, utilizing a scalar multiplication operation for transports. This representation facilitates a connection to Gaussian processes that proves useful, especially for the case where the distribution-valued processes are only observed on a sparse grid of time points. We study the convergence of the key components of the proposed representation to their population targets and demonstrate the practical utility of the proposed approach through simulations and application examples.
翻译:针对取值于单变量分布空间的分布值随机过程样本,我们在Wasserstein度量下发展其时变最优传输过程表示的统计模型。尽管函数数据分析为实值或向量值过程样本分析提供了方法工具箱,但目前尚无统一的统计方法可处理数据分析中日益常见的分布值过程样本。为满足此类方法的需求,我们引入一种基于内蕴方法的分布值随机过程样本传输模型,通过最优传输表征分布。用传输替代分布可解决分布值过程的中心化难题,并借助传输的标量乘法运算,为每个实现过程提供由全局传输与实值轨迹构成的可解释性表示。该表示与高斯过程的联系被证明具有实用价值,尤其适用于仅在稀疏时间网格观测分布值过程的情形。我们研究了所提表示中关键分量向总体目标收敛的性质,并通过仿真与应用示例验证了该方法的实际效用。