In recent years, there has been a growing demand to discern clusters of subjects in datasets characterized by a large set of features. Often, these clusters may be highly variable in size and present partial hierarchical structures. In this context, model-based clustering approaches with nonparametric priors are gaining attention in the literature due to their flexibility and adaptability to new data. However, current approaches still face challenges in recognizing hierarchical cluster structures and in managing tiny clusters or singletons. To address these limitations, we propose a novel infinite mixture model with kernels organized within a multiscale structure. Leveraging a careful specification of the kernel parameters, our method allows the inclusion of additional information guiding possible hierarchies among clusters while maintaining flexibility. We provide theoretical support and an elegant, parsimonious formulation based on infinite factorization that allows efficient inference via Gibbs sampler.
翻译:近年来,对具有大量特征的数据集进行主体聚类识别的需求日益增长。这些聚类通常在规模上存在较大差异,并呈现部分层次结构。在此背景下,基于模型的非参数先验聚类方法因其灵活性和对新数据的适应性而受到学界关注。然而,现有方法在识别层次聚类结构以及处理微小聚类或单例方面仍面临挑战。为克服这些局限,我们提出了一种新颖的无限混合模型,其核函数组织在多尺度结构中。通过对核参数的精细设定,我们的方法能够在保持灵活性的同时纳入指导聚类间可能层次关系的附加信息。我们提供了理论支持,并基于无限分解给出了优雅简洁的数学表述,从而可通过吉布斯采样器实现高效推断。