TabPFN Extensions for Interpretable Geotechnical Modelling

Geotechnical site characterisation relies on sparse, heterogeneous borehole data where uncertainty quantification and model interpretability are as critical as predictive accuracy for reliable engineering decisions. This paper presents an exploratory investigation into the use of TabPFN, a transformer-based tabular foundation model using in-context learning, and its extension library tabpfn-extensions for two geotechnical inference tasks: (1) soil-type classification using N-value and shear-wave velocity data from a synthetic geotechnical dataset, and (2) iterative imputation of five missing mechanical parameters ($s_\mathrm{u}$, $E_{\mathrm{u}}$, ${σ'}_\mathrm{p}$, $C_\mathrm{c}$, $C_\mathrm{v}$) in benchmark problem BM/AirportSoilProperties/2/2025. We apply cosine-similarity analysis to TabPFN-derived embeddings, visualise full posterior distributions from an iterative inference procedure, and compute SHAP-based feature importance, all without model retraining. Learned embeddings clearly separate Clay and Sand samples without explicit soil-type supervision; iterative imputation improves predictions for four of five target parameters, with posterior widths that reflect physically reasonable parameter-specific uncertainty; and SHAP analysis reveals the inter-parameter dependency structure, recovering established geotechnical relationships including the Skempton compression index correlation and the inverse dependence of preconsolidation pressure on water content. These results suggest the potential of foundation-model-based tools to support interpretable, uncertainty-aware parameter inference in data-scarce geotechnical practice.

翻译：岩土工程场地描述依赖于稀疏、异质的钻孔数据，在此类数据中，不确定性量化与模型可解释性对于可靠的工程决策而言，其重要性不亚于预测精度。本文探索性地研究了TabPFN（一种基于Transformer、利用上下文学习的表格基础模型）及其扩展库tabpfn-extensions在两项岩土工程推理任务中的应用：(1) 利用来自合成岩土数据集的N值和剪切波速数据进行土体类型分类；(2) 针对基准问题BM/AirportSoilProperties/2/2025中五个缺失力学参数（$s_\mathrm{u}$、$E_{\mathrm{u}}$、${σ'}_\mathrm{p}$、$C_\mathrm{c}$、$C_\mathrm{v}$）进行迭代插补。我们应用余弦相似度分析TabPFN的嵌入表示，可视化迭代推理过程中的完整后验分布，并计算基于SHAP的特征重要性，全程无需模型重训练。学习的嵌入表示在无显式土体类型监督下清晰区分了黏土与砂土样本；迭代插补改善了对五个目标参数中四个的预测，其后验宽度反映了物理上合理的参数特定不确定性；SHAP分析揭示了参数间的依赖结构，重现了已确立的岩土关系，包括Skempton压缩指数相关性以及先期固结压力与含水量的反比依赖关系。这些结果表明，基于基础模型的工具有望在数据匮乏的岩土工程实践中支持可解释、具有不确定性意识的参数推理。