Interval-valued data are one of the most common symbolic data types, which enables the preservation of the underlying variability of the data. The interval mean and covariance matrix can be estimated using the barycenter approach based on the Mallows distance. However, as for conventional data, classical estimates can be significantly affected by anomalous data points, frequently present in real-life datasets. To address this problem, we develop a robust alternative which estimates location and scale by extending the Minimum Covariance Determinant estimator to interval-valued data. The algorithm yields a robust Interval-Mahalanobis distance, which can be used to detect anomalous observations based on adaptive cutoff values. Through extensive simulation studies across various contamination levels, we demonstrate that the interval-valued robust estimator consistently outperforms classical methods in covariance matrix estimation and achieves superior outlier detection accuracy. Finally, the applicability and effectiveness of the proposed method are illustrated through real-world datasets.
翻译:区间值数据是最常见的符号数据类型之一,能够保留数据的潜在变异性。基于Mallows距离的质心方法可估计区间均值和协方差矩阵。然而,与传统数据类似,经典估计量容易受到现实数据集中频繁出现的异常数据点的影响。为解决该问题,我们通过将最小协方差行列式估计量扩展至区间值数据,开发了一种稳健的替代方案来估计位置与尺度参数。该算法生成稳健的区间型马氏距离,可基于自适应截断值检测异常观测值。通过在不同污染水平下进行大量仿真研究,我们证明了区间值稳健估计量在协方差矩阵估计中始终优于经典方法,并实现了卓越的异常值检测精度。最后,通过实际数据集验证了所提方法的适用性与有效性。