This article provides an overview on the statistical modeling of complex data as increasingly encountered in modern data analysis. It is argued that such data can often be described as elements of a metric space that satisfies certain structural conditions and features a probability measure. We refer to the random elements of such spaces as random objects and to the emerging field that deals with their statistical analysis as metric statistics. Metric statistics provides methodology, theory and visualization tools for the statistical description, quantification of variation, centrality and quantiles, regression and inference for populations of random objects for which samples are available. In addition to a brief review of current concepts, we focus on distance profiles as a major tool for object data in conjunction with the pairwise Wasserstein transports of the underlying one-dimensional distance distributions. These pairwise transports lead to the definition of intuitive and interpretable notions of transport ranks and transport quantiles as well as two-sample inference. An associated profile metric complements the original metric of the object space and may reveal important features of the object data in data analysis We demonstrate these tools for the analysis of complex data through various examples and visualizations.
翻译:本文概述了现代数据分析中日益常见的复杂数据的统计建模。我们认为,这类数据通常可以描述为满足特定结构条件并具有概率测度的度量空间中的元素。我们将此类空间中的随机元素称为随机对象,并将处理其统计分析的新兴领域称为度量统计。度量统计为随机对象总体的统计描述、变异度量、中心性和分位数、回归及推断提供了方法论、理论和可视化工具,这些随机对象具有可获取的样本。除了简要回顾当前概念外,我们重点介绍了距离剖面作为对象数据分析的主要工具,并结合底层一维距离分布的配对Wasserstein传输。这些配对传输引出了直观且可解释的传输秩和传输分位数以及双样本推断的定义。相关的剖面度量补充了对象空间的原始度量,并可能在数据分析中揭示对象数据的重要特征。我们通过各种示例和可视化展示了这些工具在复杂数据分析中的应用。