We develop statistical models for samples of distribution-valued stochastic processes featuring time-indexed univariate distributions, with emphasis on functional principal component analysis. The proposed model presents an intrinsic rather than transformation-based approach. The starting point is a transport process representation for distribution-valued processes under the Wasserstein metric. Substituting transports for distributions addresses the challenge of centering distribution-valued processes and leads to a useful and interpretable decomposition of each realized process into a process-specific single transport and a real-valued trajectory. This representation makes it possible to utilize a scalar multiplication operation for transports and facilitates not only functional principal component analysis but also to introduce a latent Gaussian process. This Gaussian process proves especially useful for the case where the distribution-valued processes are only observed on a sparse grid of time points, establishing an approach for longitudinal distribution-valued data. We study the convergence of the key components of this novel representation to their population targets and demonstrate the practical utility of the proposed approach through simulations and several data illustrations.
翻译:本文针对以时间索引的一维分布为特征的分布值随机过程样本,开发了统计模型,重点研究功能主成分分析。所提出的模型提供了一种基于内在结构而非变换的方法。其出发点是在Wasserstein度量下对分布值过程进行传输过程表示。用传输替代分布解决了分布值过程中心化难题,并将每个实现过程分解为过程特定的单一传输与实值轨迹,该分解具有实用性和可解释性。这种表示使得传输的标量乘法运算成为可能,不仅促进了功能主成分分析的实现,还引入了潜在高斯过程。该高斯过程在分布值过程仅于稀疏时间网格上观测的情况下尤为有效,为纵向分布值数据建立了分析方法。我们研究了该新颖表示中关键分量向其总体目标的收敛性,并通过仿真与多个数据实例证明了所提方法的实用价值。