Statistical data depth plays an important role in the analysis of multivariate data sets. The main outcome is a center-outward ordering of the observations that can be used both to highlight features of the underlying distribution of the data and as input to further statistical analysis. An important property of data depth is related to symmetric distributions as the point with the highest depth value, the center, coincides with the point of symmetry. However, there are applications in which it is more natural to consider symmetry with respect to a subspace of a certain dimension rather than to a point, i.e. a subspace of dimension zero. We provide a general framework to construct statistical data depths which attain maximum value in a subspace, providing a center-outward ordering from that subspace. We refer to these data depths as central subspace data depths. Moreover, if the distribution is symmetric with respect to a subspace, then the depth is maximized at that subspace. We introduce general notions of symmetry about a subspace for distributions, study the properties of central subspace data depths and provide asymptotic convergence for the corresponding sample versions. Additionally, we discuss connections with projection pursuit and dimension reduction. An application based on custom data fraud detection shows the importance of the proposed approach and strengthens its potential.
翻译:统计数据深度在多变量数据集分析中扮演着重要角色。其主要结果是观测值的一种由中心向外的排序,该排序既可用于揭示数据潜在分布的特征,也可作为进一步统计分析的基础。数据深度的一个重要性质与对称分布相关:深度值最高的点(即中心)与对称点重合。然而,在某些应用中,考虑关于特定维数子空间(而非零维子空间,即点)的对称性更为自然。我们提出了一个通用框架来构建统计数据深度,该深度在某个子空间处取得最大值,从而提供从该子空间出发的由中心向外的排序。我们将这类数据深度称为中心子空间数据深度。此外,若分布关于某个子空间对称,则深度在该子空间处达到最大。我们引入了关于子空间对称性的一般概念,研究了中心子空间数据深度的性质,并给出了相应样本版本的渐近收敛性。此外,我们还探讨了其与投影寻踪和降维的联系。一项基于定制数据欺诈检测的应用展示了所提方法的重要性,并强化了其潜在价值。