The clustering of bounded data presents unique challenges in statistical analysis due to the constraints imposed on the data values. This paper introduces a novel method for model-based clustering specifically designed for bounded data. Building on the transformation-based approach to Gaussian mixture density estimation introduced by Scrucca (2019), we extend this framework to develop a probabilistic clustering algorithm for data with bounded support that allows for accurate clustering while respecting the natural bounds of the variables. In our proposal, a flexible range-power transformation is employed to map the data from its bounded domain to the unrestricted real space, hence enabling the estimation of Gaussian mixture models in the transformed space. This approach leads to improved cluster recovery and interpretation, especially for complex distributions within bounded domains. The performance of the proposed method is evaluated through real-world data applications involving both fully and partially bounded data, in both univariate and multivariate settings. The results demonstrate the effectiveness and advantages of our approach over traditional and advanced model-based clustering techniques that employ distributions with bounded support.
翻译:有界数据的聚类分析在统计学中面临独特挑战,因为数据值受到约束限制。本文提出一种专门针对有界数据设计的模型聚类新方法。基于Scrucca(2019)提出的变换式高斯混合密度估计框架,我们扩展该框架以开发适用于有界支撑数据的概率聚类算法,该算法在尊重变量自然边界的同时实现精确聚类。在我们的方案中,采用灵活的范围幂变换将数据从其有界定义域映射到无限制的实数空间,从而能够在变换空间中进行高斯混合模型估计。该方法显著提升了聚类恢复效果与可解释性,特别适用于有界域内的复杂分布。通过包含完全有界与部分有界数据的实际应用案例,在单变量与多变量场景下评估了所提方法的性能。结果表明,相较于采用有界支撑分布的传统及先进模型聚类技术,本方法展现出显著的有效性与优势。