Statistical data depth plays an important role in the analysis of multivariate data sets. The main outcome is a center-outward ordering of the observations that can be used both to highlight features of the underlying distribution of the data and as input to further statistical analysis. An important property of data depth is related to symmetric distributions as the point with the highest depth value, the center, coincides with the point of symmetry. However, there are applications in which it is more natural to consider symmetry with respect to a subspace of a certain dimension rather than to a point, i.e. a subspace of dimension zero. We provide a general framework to construct statistical data depths which attain maximum value in a subspace, providing a center-outward ordering from that subspace. We refer to these data depths as central subspace data depths. Moreover, if the distribution is symmetric with respect to a subspace, then the depth is maximized at that subspace. We introduce general notions of symmetry about a subspace for distributions, study the properties of central subspace data depths and provide asymptotic convergence for the corresponding sample versions. Additionally, we discuss connections with projection pursuit and dimension reduction. An application based on custom data fraud detection shows the importance of the proposed approach and strengthens its potential.
翻译:暂无翻译