Diffusion models now generate high-quality, diverse samples, with an increasing focus on more powerful models. Although ensembling is a well-known way to improve supervised models, its application to unconditional score-based diffusion models remains largely unexplored. In this work we investigate whether it provides tangible benefits for generative modelling. We find that while ensembling the scores generally improves the score-matching loss and model likelihood, it fails to consistently enhance perceptual quality metrics such as FID on image datasets. We confirm this observation across a breadth of aggregation rules using Deep Ensembles, Monte Carlo Dropout, on CIFAR-10 and FFHQ. We attempt to explain this discrepancy by investigating possible explanations, such as the link between score estimation and image quality. We also look into tabular data through random forests, and find that one aggregation strategy outperforms the others. Finally, we provide theoretical insights into the summing of score models, which shed light not only on ensembling but also on several model composition techniques (e.g. guidance).
翻译:扩散模型现已能够生成高质量且多样化的样本,研究重点正日益转向更强大的模型。尽管集成是改进监督模型的成熟方法,但其在无条件基于分数的扩散模型中的应用仍鲜有探索。本研究旨在探究集成方法能否为生成建模带来实质性提升。我们发现,虽然集成分数通常能改善分数匹配损失和模型似然度,但未能持续提升图像数据集上的感知质量指标(如FID)。这一结论在CIFAR-10和FFHQ数据集上通过深度集成、蒙特卡洛Dropout等多种聚合规则得到了验证。我们尝试通过探究分数估计与图像质量之间的关联等可能解释来说明这种差异。同时,我们借助随机森林方法对表格数据进行分析,发现某种聚合策略优于其他方法。最后,我们从理论角度阐释了分数模型求和机制,这不仅揭示了集成方法的本质,也阐明了多种模型组合技术(如引导生成)的内在原理。