The extremal dependence structure of a regularly varying $d$-dimensional random vector can be described by its angular measure. The standard nonparametric estimator of this measure is the empirical measure of the observed angles of the $k$ random vectors with largest norm, for a suitably chosen number $k$. Due to the curse of dimensionality, for moderate or large $d$, this estimator is often inaccurate. If the angular measure is concentrated on a vicinity of a lower dimensional subspace, then first projecting the data on a lower dimensional subspace obtained by a principal component analysis of the angles of extreme observations can substantially improve the performance of the estimator. We derive the asymptotic behavior of such PCA projections and the resulting excess risk. In particular, it is shown that, under mild conditions, the excess risk (as a function of $k$) decreases much faster than it was suggested by empirical risk bounds obtained in \cite{DS21}. Moreover, functional limit theorems for local empirical processes of the (empirical) reconstruction error of projections uniformly over neighborhoods of the true optimal projection are established. Based on these asymptotic results, we propose a data-driven method to select the dimension of the projection space. Finally, the finite sample performance of resulting estimators is examined in a simulation study.
翻译:一个正则变化的$d$维随机向量的极值相依结构可由其角测度描述。该测度的标准非参数估计量是观测到的$k$个具有最大范数的随机向量角度的经验测度,其中$k$为适当选择的数值。由于维数灾难,在中等或高维$d$下,该估计量往往不准确。若角测度集中于某个低维子空间的邻域内,则首先通过极端观测角度的主成分分析获得低维子空间,并将数据投影至该子空间,可显著提升估计量的性能。我们推导了此类PCA投影及其所产生超额风险的渐近行为。特别地,研究表明在温和条件下,超额风险(作为$k$的函数)下降速度远快于\cite{DS21}中经验风险界所提示的速度。此外,我们建立了投影重构误差的局部经验过程在真实最优投影邻域上一致收敛的函数极限定理。基于这些渐近结果,我们提出了一种数据驱动的方法来选择投影空间的维度。最后,通过模拟研究检验了所得估计量的有限样本性能。