The usefulness of Bayesian models for density and cluster estimation is well established across multiple literatures. However, there is still a known tension between the use of simpler, more interpretable models and more flexible, complex ones. In this paper, we propose a novel method that integrates these two approaches by projecting the fit of a flexible, overparameterized model onto a lower-dimensional parametric surrogate, which serves as a summary. This process increases interpretability while preserving most of the fit of the original model. Our approach involves three main steps. First, we fit the data using nonparametric or overparameterized models. Second, we project the posterior predictive distribution of the original model onto a sequence of parametric summary point estimates with varying dimensions using a decision-theoretic approach. Finally, given the parametric summary estimate, obtained in the second step, that best approximates the original model, we construct uncertainty quantification for this summary by projecting the original posterior distribution. We demonstrate the effectiveness of our method for generating summaries for both nonparametric and overparameterized models, delivering both point estimates and uncertainty quantification for density and cluster summaries across synthetic and real datasets.
翻译:贝叶斯模型在密度估计与聚类分析中的有效性已在多类文献中得到充分验证。然而,在简洁可解释模型与灵活复杂模型的使用之间始终存在一种已知的张力。本文提出一种新颖方法,通过将灵活过参数化模型的拟合结果投影至一个作为摘要的低维参数化替代模型,从而整合这两种建模思路。该过程在保持原模型大部分拟合能力的同时,显著提升了可解释性。我们的方法包含三个主要步骤:首先,使用非参数或过参数化模型对数据进行拟合;其次,通过决策理论方法将原模型的后验预测分布投影至具有不同维度的参数化摘要点估计序列;最后,基于第二步中获得的最优近似原模型的参数化摘要估计,通过投影原后验分布为该摘要构建不确定性量化。我们通过合成与真实数据集验证了该方法在生成非参数与过参数化模型摘要方面的有效性,能够同时为密度与聚类摘要提供点估计及不确定性量化。