Conditional flow matching for physics-constrained inverse problems with finite training data

This study presents a conditional flow matching framework for solving physics-constrained Bayesian inverse problems. In this setting, samples from the joint distribution of inferred variables and measurements are assumed available, while explicit evaluation of the prior and likelihood densities is not required. We derive a simple and self-contained formulation of both the unconditional and conditional flow matching algorithms, tailored specifically to inverse problems. In the conditional setting, a neural network is trained to learn the velocity field of a probability flow ordinary differential equation that transports samples from a chosen source distribution directly to the posterior distribution conditioned on observed measurements. This black-box formulation accommodates nonlinear, high-dimensional, and potentially non-differentiable forward models without restrictive assumptions on the noise model. We further analyze the behavior of the learned velocity field in the regime of finite training data. Under mild architectural assumptions, we show that overtraining can induce degenerate behavior in the generated conditional distributions, including variance collapse and a phenomenon termed selective memorization, wherein generated samples concentrate around training data points associated with similar observations. A simplified theoretical analysis explains this behavior, and numerical experiments confirm it in practice. We demonstrate that standard early-stopping criteria based on monitoring test loss effectively mitigate such degeneracy. The proposed method is evaluated on several physics-based inverse problems. We investigate the impact of different choices of source distributions, including Gaussian and data-informed priors. Across these examples, conditional flow matching accurately captures complex, multimodal posterior distributions while maintaining computational efficiency.

翻译：本研究提出了一种基于条件流动匹配的框架，用于求解受物理约束的贝叶斯逆问题。在该设定中，假设可获取推断变量与测量值的联合分布样本，但无需显式计算先验与似然密度。我们推导了适用于逆问题的无条件和条件流动匹配算法的简洁自包含形式。在条件设定下，通过训练神经网络学习概率流常微分方程的速场，该场可将样本从选定源分布直接传输至测量观测值条件下的后验分布。这种黑箱框架可处理非线性、高维及潜在不可微的正向模型，且无需对噪声模型施加限制性假设。我们进一步分析了有限训练数据条件下学习到的速度场行为。在温和的架构假设下，我们证明过度训练会导致生成的条件分布出现退化行为，包括方差坍缩以及称为选择性记忆的现象——生成样本集中在与相似观测关联的训练数据点附近。简化的理论分析解释了该行为，数值实验验证了其实际存在性。我们证明基于测试损失监控的标准早停准则可有效缓解此类退化问题。所提方法在多个基于物理的逆问题中进行了评估。我们探讨了不同源分布选择（包括高斯先验与数据驱动先验）的影响。在各类实例中，条件流动匹配均能在保持计算效率的同时，准确捕捉复杂多模态的后验分布。