The paper presents a novel approach for unsupervised techniques in the field of clustering. A new method is proposed to enhance existing literature models using the proper Bayesian bootstrap to improve results in terms of robustness and interpretability. Our approach is organized in two steps: k-means clustering is used for prior elicitation, then proper Bayesian bootstrap is applied as resampling method in an ensemble clustering approach. Results are analyzed introducing measures of uncertainty based on Shannon entropy. The proposal provides clear indication on the optimal number of clusters, as well as a better representation of the clustered data. Empirical results are provided on simulated data showing the methodological and empirical advances obtained.
翻译:本文提出了一种新颖的无监督聚类技术。我们提出了一种新方法,利用适当贝叶斯自助法改进现有文献模型,以增强结果的鲁棒性和可解释性。该方法分为两个步骤:首先使用k均值聚类进行先验信息提取,然后在集成聚类框架中应用适当贝叶斯自助法作为重采样方法。通过引入基于香农熵的不确定性度量对结果进行分析。该方案能明确指示最优聚类数量,并提供更优的聚类数据表征。在模拟数据上提供的实证结果表明了该方法在方法论和实证层面所取得的进展。