Uncertainty evaluation is critical in scientific and engineering inverse problems. However, existing benchmarks on Diffusion Inverse Solvers (DIS) primarily focus on reconstruction accuracy but overlook uncertainty and distributional behavior. Since stochastic inverse solvers represent uncertainty through diffusion-based posterior samples, evaluating how well their generated samples capture the target posterior distribution becomes an important aspect of uncertainty quantification. To address this limitation and better understand the distributional behavior of diffusion samplers, we conduct a systematic study to investigate the posterior fidelity of a broad range of existing DIS methods in controlled simulation settings with a known analytical true posterior. Furthermore, to enable posterior-aware evaluation on real-world inverse problems where ground-truth posterior is unavailable, we propose score-based Kernel Stein Discrepancy (score-KSD), a theoretically-grounded and ground-truth-free metric that measures the consistency of the distribution of generated samples from a DIS method with the target posterior score field, induced by the forward model and learned diffusion prior. Through both simulation experiments and real-world inverse problem solving, we validate the effectiveness of the proposed score-KSD and demonstrate that it provides meaningful posterior fidelity diagnostics beyond reconstruction accuracy, revealing that higher reconstruction accuracy does not necessarily imply better posterior consistency.
翻译:不确定性评估在科学和工程逆问题中至关重要。然而,现有针对扩散逆求解器(DIS)的基准测试主要关注重建精确度,而忽略了不确定性和分布行为。由于随机逆求解器通过基于扩散的后验样本来表征不确定性,评估其生成的样本对目标后验分布的捕捉程度成为不确定性量化中的一个重要方面。为解决这一局限并更好地理解扩散采样器的分布行为,我们在受控仿真环境中进行了系统性研究,该环境具有已知的解析真后验,以考察广泛现有DIS方法的后验保真度。此外,为了在无法获取真实后验的现实世界逆问题中实现后验感知评估,我们提出了基于得分的核斯坦因差异(score-KSD),这是一种具有理论依据且无需真实后验的度量标准,用于衡量DIS方法生成样本的分布与由前向模型和学习的扩散先验所诱导的目标后验得分场之间的一致性。通过仿真实验和现实世界逆问题求解,我们验证了所提出score-KSD的有效性,并证明其提供了超越重建精确度的有意义的后验保真度诊断,揭示了更高的重建精确度并不必然意味着更好的后验一致性。