Uncovering the opacity of diffusion-based generative models is urgently needed, as their applications continue to expand while their underlying procedures largely remain a black box. With a critical question -- how can the diffusion generation process be interpreted and understood? -- we proposed Patronus, an interpretable diffusion model that incorporates a prototypical network to encode semantics in visual patches, revealing what visual patterns are modeled and where and when they emerge throughout denoising. This interpretability of Patronus provides deeper insights into the generative mechanism, enabling the detection of shortcut learning via unwanted correlations and the tracing of semantic emergence across timesteps. We evaluate Patronus on four natural image datasets and one medical imaging dataset, demonstrating both faithful interpretability and strong generative performance. With this work, we open new avenues for understanding and steering diffusion models through prototype-based interpretability.\\ Our code is available at https://github.com/nina-weng/patronus}{https://github.com/nina-weng/patronus.
翻译:随着扩散生成模型的应用不断扩展,其底层过程在很大程度上仍是一个黑箱,因此迫切需要揭示其不透明性。针对一个关键问题——如何解释和理解扩散生成过程?——我们提出了Patronus,一种可解释的扩散模型,它结合了一个原型网络来编码视觉块中的语义,揭示了在去噪过程中建模了哪些视觉模式,以及这些模式在何时何处出现。Patronus的这种可解释性为生成机制提供了更深入的洞察,使得能够通过检测不期望的相关性来发现捷径学习,并追踪语义在不同时间步中的涌现。我们在四个自然图像数据集和一个医学影像数据集上评估了Patronus,展示了其可靠的可解释性和强大的生成性能。通过这项工作,我们为基于原型的可解释性来理解和引导扩散模型开辟了新途径。\\ 我们的代码可在 https://github.com/nina-weng/patronus 获取。