Model-based clustering is a powerful tool that is often used to discover hidden structure in data by grouping observational units that exhibit similar response values. Recently, clustering methods have been developed that permit incorporating an ``initial'' partition informed by expert opinion. Then, using some similarity criteria, partitions different from the initial one are down weighted, i.e. they are assigned reduced probabilities. These methods represent an exciting new direction of method development in clustering techniques. We add to this literature a method that very flexibly permits assigning varying levels of uncertainty to any subset of the partition. This is particularly useful in practice as there is rarely clear prior information with regards to the entire partition. Our approach is not based on partition penalties but considers individual allocation probabilities for each unit (e.g., locally weighted prior information). We illustrate the gains in prior specification flexibility via simulation studies and an application to a dataset concerning spatio-temporal evolution of ${\rm PM}_{10}$ measurements in Germany.
翻译:基于模型的聚类是一种强大的工具,常用于通过将呈现相似响应值的观测单元分组来发现数据中的隐藏结构。近年来,研究者开发了允许纳入由专家意见提供的“初始”划分的聚类方法。随后,基于某些相似性准则,与初始划分不同的划分会被赋予较低的概率权重,即其概率被降低。这些方法代表了聚类技术中方法论发展的一个令人振奋的新方向。我们为这一领域贡献了一种方法,该方法能够非常灵活地针对划分中任意子集分配不同程度的不确定性。这在实际应用中尤为有用,因为通常难以获得关于整个划分的明确先验信息。我们的方法并非基于划分惩罚,而是考虑每个单元的个体分配概率(例如,局部加权先验信息)。我们通过模拟研究以及一个关于德国${\rm PM}_{10}$测量值时空演化的数据集应用,展示了在先验设定灵活性方面的提升。