This study presents a conditional flow matching framework for solving physics-constrained Bayesian inverse problems. In this setting, samples from the joint distribution of inferred variables and measurements are assumed available, while explicit evaluation of the prior and likelihood densities is not required. We derive a simple and self-contained formulation of both the unconditional and conditional flow matching algorithms, tailored specifically to inverse problems. In the conditional setting, a neural network is trained to learn the velocity field of a probability flow ordinary differential equation that transports samples from a chosen source distribution directly to the posterior distribution conditioned on observed measurements. This black-box formulation accommodates nonlinear, high-dimensional, and potentially non-differentiable forward models without restrictive assumptions on the noise model. We further analyze the behavior of the learned velocity field in the regime of finite training data. Under mild architectural assumptions, we show that overtraining can induce degenerate behavior in the generated conditional distributions, including variance collapse and a phenomenon termed selective memorization, wherein generated samples concentrate around training data points associated with similar observations. A simplified theoretical analysis explains this behavior, and numerical experiments confirm it in practice. We demonstrate that standard early-stopping criteria based on monitoring test loss effectively mitigate such degeneracy. The proposed method is evaluated on several physics-based inverse problems. We investigate the impact of different choices of source distributions, including Gaussian and data-informed priors. Across these examples, conditional flow matching accurately captures complex, multimodal posterior distributions while maintaining computational efficiency.
翻译:本研究提出了一种用于解决物理约束贝叶斯逆问题的条件流匹配框架。在此框架中,我们假设可从推断变量与观测量的联合分布中获取样本,且无需显式计算先验分布与似然函数的概率密度。我们推导了适用于逆问题的无条件和条件流匹配算法的简洁自洽表述。在条件设定下,通过训练神经网络来学习概率流常微分方程的向量场,该向量场将样本从选定的源分布直接传输到以观测数据为条件的后验分布。这种黑箱表述能够容纳非线性、高维且可能不可微的正演模型,且无需对噪声模型施加限制性假设。我们进一步分析了有限训练数据条件下所学向量场的行为特征。在温和的网络结构假设下,研究表明过度训练会导致生成的条件分布出现退化行为,包括方差塌缩以及被称为选择性记忆的现象——即生成样本会聚集在与相似观测值相关的训练数据点周围。通过简化理论分析解释了该现象,数值实验也验证了其实际存在。我们证明基于测试损失监控的标准早停准则能有效缓解此类退化问题。所提方法在多个基于物理的逆问题上进行了评估,研究了不同源分布选择(包括高斯分布和数据驱动的先验分布)的影响。实验表明,条件流匹配方法在保持计算效率的同时,能够精确捕捉复杂的多峰后验分布。