Robust covariance estimation and explainable outlier detection for matrix-valued data

This work introduces the Matrix Minimum Covariance Determinant (MMCD) method, a novel robust location and covariance estimation procedure designed for data that are naturally represented in the form of a matrix. Unlike standard robust multivariate estimators, which would only be applicable after a vectorization of the matrix-variate samples leading to high-dimensional datasets, the MMCD estimators account for the matrix-variate data structure and consistently estimate the mean matrix, as well as the rowwise and columnwise covariance matrices in the class of matrix-variate elliptical distributions. Additionally, we show that the MMCD estimators are matrix affine equivariant and achieve a higher breakdown point than the maximal achievable one by any multivariate, affine equivariant location/covariance estimator when applied to the vectorized data. An efficient algorithm with convergence guarantees is proposed and implemented. As a result, robust Mahalanobis distances based on MMCD estimators offer a reliable tool for outlier detection. Additionally, we extend the concept of Shapley values for outlier explanation to the matrix-variate setting, enabling the decomposition of the squared Mahalanobis distances into contributions of the rows, columns, or individual cells of matrix-valued observations. Notably, both the theoretical guarantees and simulations show that the MMCD estimators outperform robust estimators based on vectorized observations, offering better computational efficiency and improved robustness. Moreover, real-world data examples demonstrate the practical relevance of the MMCD estimators and the resulting robust Shapley values.

翻译：本文提出矩阵最小协方差行列式（MMCD）方法，这是一种针对自然以矩阵形式表示的数据设计的新型稳健位置与协方差估计流程。与标准的多变量稳健估计方法（仅适用于将矩阵变量样本向量化后产生的数据集，会导致高维问题）不同，MMCD估计器能兼顾矩阵变量数据结构，在矩阵变量椭圆分布类中一致地估计均值矩阵，以及行向和列向协方差矩阵。此外，我们证明MMCD估计器具有矩阵仿射等变性，且当应用于向量化数据时，其崩溃点高于任何多变量仿射等变位置/协方差估计器所能达到的最大值。我们提出并实现了一种具有收敛保证的高效算法。基于MMCD估计器的稳健马氏距离为异常值检测提供了可靠工具。此外，我们将用于异常值解释的夏普利值概念拓展至矩阵变量场景，能够将平方马氏距离分解为矩阵观测值的行、列或单个单元格的贡献。值得注意的是，理论保证和仿真结果均表明，MMCD估计器优于基于向量化观测值的稳健估计器，具有更好的计算效率和稳健性。实际数据案例进一步验证了MMCD估计器及其生成的稳健夏普利值的实际应用价值。