Accurate seismic velocity estimations are vital to understanding Earth's subsurface structures, assessing natural resources, and evaluating seismic hazards. Machine learning-based inversion algorithms have shown promising performance in regional (i.e., for exploration) and global velocity estimation, while their effectiveness hinges on access to large and diverse training datasets whose distributions generally cover the target solutions. Additionally, enhancing the precision and reliability of velocity estimation also requires incorporating prior information, e.g., geological classes, well logs, and subsurface structures, but current statistical or neural network-based methods are not flexible enough to handle such multi-modal information. To address both challenges, we propose to use conditional generative diffusion models for seismic velocity synthesis, in which we readily incorporate those priors. This approach enables the generation of seismic velocities that closely match the expected target distribution, offering datasets informed by both expert knowledge and measured data to support training for data-driven geophysical methods. We demonstrate the flexibility and effectiveness of our method through training diffusion models on the OpenFWI dataset under various conditions, including class labels, well logs, reflectivity images, and the combination of these priors. The performance of the approach under out-of-distribution conditions further underscores its generalization ability, showcasing its potential to provide tailored priors for velocity inverse problems and create specific training datasets for machine learning-based geophysical applications.
翻译:精确的地震速度估计对于理解地球地下结构、评估自然资源和评价地震危险性至关重要。基于机器学习的反演算法在区域(即勘探)和全球速度估计中已展现出良好性能,但其有效性依赖于获取大规模且多样化的训练数据集,这些数据集的分布通常需覆盖目标解空间。此外,提高速度估计的精度和可靠性还需要融入先验信息,例如地质类别、测井曲线和地下结构,但当前基于统计或神经网络的方法在处理此类多模态信息时灵活性不足。为应对这两项挑战,我们提出使用条件生成扩散模型进行地震速度合成,该方法可便捷地融入上述先验信息。该技术能够生成与预期目标分布高度匹配的地震速度数据,提供融合专家知识与实测数据的训练集,以支持数据驱动地球物理方法的训练。通过在OpenFWI数据集上训练扩散模型,我们在多种条件下(包括类别标签、测井曲线、反射率图像及其组合先验)验证了本方法的灵活性与有效性。该方法在分布外条件下的性能进一步凸显了其泛化能力,展示了其为速度反问题提供定制化先验、并为基于机器学习的地球物理应用创建特定训练数据集的潜力。