Learned priors based on deep generative models offer data-driven regularization for seismic inversion, but training them requires a dataset of representative subsurface models -- a resource that is inherently scarce in geoscience applications. Since the training objective of most generative models can be cast as maximum likelihood on a finite dataset, any such model risks converging to the empirical distribution -- effectively memorizing the training examples rather than learning the underlying geological distribution. We show that the posterior under such a memorized prior reduces to a reweighted empirical distribution -- i.e., a likelihood-weighted lookup among the stored training examples. For diffusion models specifically, memorization yields a Gaussian mixture prior in closed form, and linearizing the forward operator around each training example gives a Gaussian mixture posterior whose components have widths and shifts governed by the local Jacobian. We validate these predictions on a stylized inverse problem and demonstrate the consequences of memorization through diffusion posterior sampling for full waveform inversion.
翻译:基于深度生成模型学习的先验为地震反演提供了数据驱动的正则化,但训练这类模型需要一组具有代表性的地下模型数据集——这一资源在地学应用中本就稀缺。由于大多数生成模型的训练目标可视为有限数据集上的最大似然估计,任何此类模型都有可能收敛到经验分布——即实质上记忆训练样本,而非学习底层的地质分布。我们表明,在这种记忆型先验下的后验分布会退化为加权的经验分布——即对存储的训练样本进行似然加权的查找。具体而言,对于扩散模型,记忆效应会生成一个封闭形式的高斯混合先验,而将每个训练样本周围的的正演算子线性化会得到高斯混合后验,其分量的宽度和偏移由局部雅可比矩阵控制。我们通过一个风格化的反问题验证了这些预测,并利用扩散后验采样在全波形反演中展示了记忆化带来的后果。