The increasing need to analyse large volumes of data has led to the development of Symbolic Data Analysis as a promising field to tackle the data challenges of our time. New data types, such as interval-valued data, have brought fresh theoretical and methodological problems to be solved. In this paper, we derive explicit formulas for computing the Mallows' distance, also known as $L_2$ Wasserstein distance, between two \textit{p}-dimensional intervals, using information regarding the distribution of the microdata. We establish this distance as a Mahalanobis' distance between two 2\textit{p}-dimensional vectors. Our comprehensive analysis leads to the generalisation of the definitions of the expected value and covariance matrix of an interval-valued random vector. These novel results bring theoretical support and interpretability to state-of-the-art contributions. Additionally, we discuss real examples that illustrate how we can model different levels of available information on the microdata, leading to proper estimates of the measures of location and association.
翻译:分析海量数据的需求日益增长,推动了符号数据分析这一新兴领域的发展,以应对当前时代的数据挑战。区间值数据等新型数据类型带来了亟待解决的理论与方法学问题。本文基于微观数据分布信息,推导了计算两个\textit{p}维区间之间Mallows距离(亦称$L_2$ Wasserstein距离)的显式公式,并将该距离确立为两个2\textit{p}维向量之间的马氏距离。通过系统性分析,我们推广了区间值随机向量期望值与协方差矩阵的定义。这些创新成果为前沿研究提供了理论支撑与可解释性。此外,我们通过实际案例探讨了如何根据微观数据的可用信息层级建立模型,从而获得位置度量与关联度量的合理估计。