Robust representations of oil wells' intervals via sparse attention mechanism

Determining the characteristics of newly drilled wells (e.g. reservoir formation properties) is a major challenge. One of the corresponding tasks is a well-interval similarity assessment: if we can learn to predict which oilfields are rich and which are not by comparing them with existing ones, this will lead to significant cost reductions. There are three main requirements for applying machine learning to oil&gas data: high quality even for unreliable data, low manual effort and interpretability of the model itself. Neural networks can be used to address these challenges. The use of a self-supervised paradigm leads to automatic model construction. However, existing approaches lack interpretability, and their quality prevents their use in applications. In particular, existing approaches like LSTM suffer from short-term memory, paying more attention to the end of a sequence. Instead, neural networks with Transformer architecture cast their attention over all sequences to make a decision. To make them more efficient in terms of computational time and more robust to noisy or absent values, we introduce a limited attention mechanism similar to that of the Informer architecture that considers only top correspondences. We run experiments on an open dataset with more than $20$ wells, making our experiments reliable and suitable for industrial use. The best results were obtained with our adaptation of the Informer variant of Transformer with ROC AUC $0.982$. It outperforms classical approaches with ROC AUC $0.824$, recurrent neural networks (RNNs) with ROC AUC $0.934$ and the direct use of Transformer with ROC AUC $0.961$. We show that well-interval representations obtained by Informer are of higher quality than those extracted by RNNs. Moreover, the obtained attention is interpretable, as it corresponds to the importance of a particular part of an interval for the similarity estimation.

翻译：确定新钻井的特征（如储层性质）是一项重大挑战。其中一项任务是井层段相似性评估：如果能够通过学习将新油井与现有油井进行对比，预测哪些油田富油、哪些贫油，将大幅降低成本。将机器学习应用于油气数据需满足三个主要要求：即使对于不可靠数据也需具备高质量、低人工干预以及模型本身的可解释性。神经网络可用于应对这些挑战。自监督范式的使用可实现模型自动构建，然而现有方法缺乏可解释性，且其质量限制了实际应用。例如，LSTM等现有方法存在短期记忆缺陷，会过度关注序列末尾。与之不同，基于Transformer架构的神经网络会将注意力分配到所有序列上以做出决策。为提高计算效率和针对噪声或缺失值的鲁棒性，我们引入了一种类似Informer架构的有限注意力机制，仅考虑最高相关性的对应关系。我们在包含超过20口井的开放数据集上进行了实验，确保了实验的可靠性和工业适用性。最佳结果来自我们改进的Informer变体Transformer，其ROC AUC达到0.982，优于经典方法（ROC AUC 0.824）、循环神经网络（RNN，ROC AUC 0.934）及直接使用Transformer（ROC AUC 0.961）。我们证明，Informer获得的井层段表示比RNN提取的表示质量更高。此外，所得的注意力具有可解释性，因其对应了层段特定部分对相似性评估的重要性。