A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior

翻译：大规模地质断层刻画模型基准测试：域偏移、训练动态、泛化性、评估与推理行为

Jorge Quesada,Chen Zhou,Prithwijit Chowdhury,Mohammad Alotaibi,Ahmad Mustafa,Yusufjon Kumakov,Mohit Prabhushankar,Ghassan AlRegib

Machine learning has taken a critical role in seismic interpretation workflows, especially in fault delineation tasks. However, despite the recent proliferation of pretrained models and synthetic datasets, the field still lacks a systematic understanding of the generalizability limits of these models across seismic data representing diverse geologic, acquisition and processing settings. Distributional shifts between data sources, limitations in fine-tuning strategies and labeled data accessibility, and inconsistent evaluation protocols all remain major roadblocks to deploying reliable models in real-world exploration. In this paper, we present the first large-scale benchmarking study explicitly designed to provide guidelines for domain shift strategies in seismic interpretation. Our benchmark spans over 200 combinations of model architectures, datasets and training strategies, across three datasets (synthetic and real) including FaultSeg3D, CRACKS, and Thebe. We systematically assess pretraining, fine-tuning, and joint training under varying domain shifts. Our analysis shows that common fine-tuning practices can lead to catastrophic forgetting, especially when source and target datasets are disjoint, and that larger models such as Segformer are more robust than smaller architectures. We also find that domain adaptation methods outperform fine-tuning when shifts are large, yet underperform when domains are similar. Finally, we complement segmentation metrics with a novel analysis based on fault characteristic descriptors, revealing how models absorb structural biases from training datasets. Overall, we establish a robust experimental baseline that provides insights into tradeoffs in current fault delineation workflows and highlights directions for building more generalizable and interpretable models.

翻译：机器学习在地震解释工作流中发挥着关键作用，尤其在断层刻画任务中。然而，尽管近期预训练模型和合成数据集大量涌现，该领域仍缺乏对这些模型在代表不同地质、采集和处理设置的地震数据间泛化极限的系统性理解。数据源之间的分布偏移、微调策略与标注数据可及性的限制，以及不一致的评估协议，均是实际勘探中部署可靠模型的主要障碍。本文首次提出一项明确为地震解释中的域偏移策略提供指导的大规模基准测试研究。我们的基准涵盖超过200种模型架构、数据集和训练策略的组合，涉及三个数据集（合成与真实），包括FaultSeg3D、CRACKS和Thebe。我们系统评估了不同域偏移下的预训练、微调和联合训练。分析表明，常见的微调实践可能导致灾难性遗忘，尤其在源数据集与目标数据集不相交时；而如Segformer等较大模型比较小架构更为鲁棒。我们还发现，当偏移较大时，域适应方法优于微调，但在域相似时表现不佳。最后，我们通过基于断层特征描述符的新颖分析补充了分割指标，揭示了模型如何从训练数据集中吸收结构偏差。总体而言，我们建立了一个稳健的实验基线，为当前断层刻画工作流中的权衡提供了见解，并指明了构建更具泛化性和可解释性模型的方向。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

《军事场景上下文推理：大语言模型长上下文地理空间推理与规划能力基准测试》最新资料

专知会员服务

34+阅读 · 3月14日

【博士论文】小型和大型模型的不确定性估计

专知会员服务

21+阅读 · 2025年7月11日

大模型如何预测天气？悉尼科技大学等最新《天气和气候数据理解的基础模型》综述

专知会员服务

49+阅读 · 2023年12月9日