Data depth is a powerful nonparametric tool originally proposed to rank multivariate data from center outward. In this context, one of the most archetypical depth notions is Tukey's halfspace depth. In the last few decades notions of depth have also been proposed for functional data. However, Tukey's depth cannot be extended to handle functional data because of its degeneracy. Here, we propose a new halfspace depth for functional data which avoids degeneracy by regularization. The halfspace projection directions are constrained to have a small reproducing kernel Hilbert space norm. Desirable theoretical properties of the proposed depth, such as isometry invariance, maximality at center, monotonicity relative to a deepest point, upper semi-continuity, and consistency are established. Moreover, the regularized halfspace depth can rank functional data with varying emphasis in shape or magnitude, depending on the regularization. A new outlier detection approach is also proposed, which is capable of detecting both shape and magnitude outliers. It is applicable to trajectories in L2, a very general space of functions that include non-smooth trajectories. Based on extensive numerical studies, our methods are shown to perform well in terms of detecting outliers of different types. Three real data examples showcase the proposed depth notion.
翻译:数据深度是一种强大的非参数工具,最初用于从中心向外对多元数据进行排序。在这一背景下,最经典深度概念之一是Tukey的半空间深度。近几十年来,研究者也提出了针对函数型数据的深度概念。然而,Tukey深度因其退化性无法推广至函数型数据。本文提出了一种新的函数型数据半空间深度,通过正则化避免退化。半空间投影方向被约束为具有较小的再生核希尔伯特空间范数。我们建立了所提出深度的理想理论性质,包括等距不变性、中心极大性、关于最深点的单调性、上半连续性以及一致性。此外,正则化半空间深度能够根据正则化方式的不同,对函数型数据的形状或幅度进行不同偏重的排序。我们还提出了一种新的异常值检测方法,能够同时检测形状异常和幅度异常,适用于L2空间中的轨迹——这是一个包含非光滑轨迹的非常一般的函数空间。基于广泛的数值研究,我们的方法在检测不同类型异常值方面表现良好。三个真实数据案例展示了所提出的深度概念。