Repulsive Mixture Model with Projection Determinantal Point Process

In many scientific domains, clustering aims to reveal interpretable latent structure that reflects relevant subpopulations or processes. Widely used Bayesian mixture models for model-based clustering often produce overlapping or redundant components because priors on cluster locations are specified independently, hindering interpretability. To mitigate this, repulsive priors have been proposed to encourage well-separated components, yet existing approaches face both computational and theoretical challenges. We introduce a fully tractable Bayesian repulsive mixture model by assigning a projection Determinantal Point Process (DPP) prior to the component locations. Projection DPPs induce strong repulsion and allow exact sampling, enabling parsimonious and interpretable posterior clustering. Leveraging their analytical tractability, we derive closed-form posterior and predictive distributions. These results, in turn, enable two efficient inference algorithms: a conditional Gibbs sampler and the first fully implementable marginal sampler for DPP-based mixtures. We also provide strong frequentist guarantees, including posterior consistency for density estimation, elimination of redundant components, and contraction of the mixing measure. Simulation studies confirm superior mixing and clustering performance compared to alternatives in misspecified settings. Finally, we demonstrate the utility of our method on event-related potential functional data, where it uncovers interpretable neuro-cognitive subgroups. Our results support the projection DPP mixtures as a theoretically sound and practically effective solution for Bayesian clustering.

翻译：在许多科学领域中，聚类旨在揭示可解释的潜在结构，这些结构反映了相关的亚群或过程。广泛用于基于模型聚类的贝叶斯混合模型通常会产生重叠或冗余的分量，这是因为聚类位置的先验被独立指定，从而阻碍了可解释性。为了缓解这一问题，排斥先验被提出以鼓励良好分离的分量，然而现有方法面临计算和理论上的挑战。我们通过为分量位置分配投影行列式点过程（DPP）先验，引入了一个完全可处理的贝叶斯排斥混合模型。投影DPP诱导强排斥并允许精确采样，从而实现简约且可解释的后验聚类。利用其解析可处理性，我们推导出闭式后验分布和预测分布。这些结果进而实现了两种高效的推断算法：条件吉布斯采样器和首个完全可实现的基于DPP混合的边际采样器。我们还提供了强大的频率派保证，包括密度估计的后验一致性、冗余分量的消除以及混合测度的收缩。仿真研究证实，在误设情况下，与替代方法相比，我们的方法具有更优的混合和聚类性能。最后，我们在事件相关电位功能数据上展示了我们方法的实用性，该方法揭示了可解释的神经认知亚组。我们的研究结果支持投影DPP混合模型作为一种理论上可靠且实际有效的贝叶斯聚类解决方案。