We propose a new class of models for variable clustering called Asymptotic Independent block (AI-block) models, which defines population-level clusters based on the independence of the maxima of a multivariate stationary mixing random process among clusters. This class of models is identifiable, meaning that there exists a maximal element with a partial order between partitions, allowing for statistical inference. We also present an algorithm depending on a tuning parameter that recovers the clusters of variables without specifying the number of clusters \emph{a priori}. Our work provides some theoretical insights into the consistency of our algorithm, demonstrating that under certain conditions it can effectively identify clusters in the data with a computational complexity that is polynomial in the dimension. A data-driven selection method for the tuning parameter is also proposed. To further illustrate the significance of our work, we applied our method to neuroscience and environmental real-datasets. These applications highlight the potential and versatility of the proposed approach.
翻译:本文提出了一类新的变量聚类模型——渐近独立块(AI-block)模型,该模型基于多元平稳混合随机过程的极值在聚类间的独立性来定义总体层面的聚类。此类模型具有可识别性,即存在一个在划分偏序关系下的极大元,从而允许进行统计推断。我们还提出了一种依赖于调优参数的算法,该算法无需预先指定聚类数量即可恢复变量聚类。我们的研究为算法的一致性提供了理论依据,证明在一定条件下,该算法能以维度上的多项式计算复杂度有效识别数据中的聚类。同时,本文还提出了调优参数的数据驱动选择方法。为进一步阐明本研究的实际意义,我们将所提方法应用于神经科学和环境领域的真实数据集。这些应用凸显了该方法的潜力和广泛适用性。