高维数据中的尾部聚类 (Clustering Tails in High Dimension)

One potential solution to combat the scarcity of tail observations in extreme value analysis is to integrate information from multiple datasets sharing similar tail properties, for instance, a common extreme value index. In other words, for a multivariate dataset, we intend to group dimensions into clusters first, before applying any pooling techniques. This paper addresses the clustering problem for a high dimensional dataset, according to their extreme value indices. We propose an iterative clustering procedure that sequentially partitions the variables into groups, ordered from the heaviest-tailed to the lightesttailed distributions. At each step, our method identifies and extracts a group of variables that share the highest extreme value index among the remaining ones. This approach differs fundamentally from conventional clustering methods such as using pre-estimated extreme value indices in a two-step clustering method. We show the consistency property of the proposed algorithm and demonstrate its finite-sample performance using a simulation study and a real data application.

翻译：在极值分析中，应对尾部观测值稀缺性的一种潜在解决方案是整合来自多个具有相似尾部特性（例如，共同的极值指数）数据集的信息。换言之，对于多元数据集，我们计划在应用任何池化技术之前，首先将维度分组为若干簇。本文针对高维数据集，根据其极值指数来解决聚类问题。我们提出了一种迭代聚类程序，该程序将变量依次划分为若干组，这些组按照从最重尾分布到最轻尾分布的顺序排列。在每一步中，我们的方法识别并提取出一组变量，这些变量在剩余变量中共享最高的极值指数。这种方法从根本上不同于传统的聚类方法，例如在两步聚类方法中使用预估计的极值指数。我们证明了所提出算法的一致性性质，并通过模拟研究和实际数据应用展示了其有限样本性能。