Uncertainty and Explainable Analysis of Machine Learning Model for Reconstruction of Sonic Slowness Logs

Logs are valuable information for oil and gas fields as they help to determine the lithology of the formations surrounding the borehole and the location and reserves of subsurface oil and gas reservoirs. However, important logs are often missing in horizontal or old wells, which poses a challenge in field applications. In this paper, we utilize data from the 2020 machine learning competition of the SPWLA, which aims to predict the missing compressional wave slowness and shear wave slowness logs using other logs in the same borehole. We employ the NGBoost algorithm to construct an Ensemble Learning model that can predicate the results as well as their uncertainty. Furthermore, we combine the SHAP method to investigate the interpretability of the machine learning model. We compare the performance of the NGBosst model with four other commonly used Ensemble Learning methods, including Random Forest, GBDT, XGBoost, LightGBM. The results show that the NGBoost model performs well in the testing set and can provide a probability distribution for the prediction results. In addition, the variance of the probability distribution of the predicted log can be used to justify the quality of the constructed log. Using the SHAP explainable machine learning model, we calculate the importance of each input log to the predicted results as well as the coupling relationship among input logs. Our findings reveal that the NGBoost model tends to provide greater slowness prediction results when the neutron porosity and gamma ray are large, which is consistent with the cognition of petrophysical models. Furthermore, the machine learning model can capture the influence of the changing borehole caliper on slowness, where the influence of borehole caliper on slowness is complex and not easy to establish a direct relationship. These findings are in line with the physical principle of borehole acoustics.

翻译：测井数据对油气田至关重要，它能帮助确定钻孔周围地层岩性以及地下油气藏的位置和储量。然而，在水平井或老井中，重要的测井数据往往缺失，这给现场应用带来挑战。本文利用SPWLA 2020年机器学习竞赛数据，旨在通过同一钻孔中的其他测井数据预测缺失的纵波慢度和横波慢度。我们采用NGBoost算法构建集成学习模型，该模型不仅能预测结果，还能量化其不确定性。进一步，我们结合SHAP方法探究机器学习模型的可解释性。我们将NGBoost模型与随机森林、GBDT、XGBoost、LightGBM四种常用集成学习方法进行性能对比。结果表明，NGBoost模型在测试集上表现良好，并能提供预测结果的概率分布。此外，预测慢度概率分布的方差可用于评估构建测井数据的质量。利用SHAP可解释机器学习模型，我们计算了各输入测井对预测结果的重要性以及输入测井间的耦合关系。研究发现，当中子孔隙度和伽马射线较大时，NGBoost模型倾向于给出更大的慢度预测值，这与岩石物理模型认知一致。此外，该机器学习模型能够捕捉钻孔径变化对慢度的影响——尽管钻孔径对慢度的影响复杂且难以直接建立关系——这与钻孔声学的物理原理相符。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日