The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval. Previous attempts are mainly supervised and focus on similarity task, which estimates closeness between intervals. We desire to build informative representations without using supervised (labelled) data. One of the possible approaches is self-supervised learning (SSL). In contrast to the supervised paradigm, this one requires little or no labels for the data. Nowadays, most SSL approaches are either contrastive or non-contrastive. Contrastive methods make representations of similar (positive) objects closer and distancing different (negative) ones. Due to possible wrong marking of positive and negative pairs, these methods can provide an inferior performance. Non-contrastive methods don't rely on such labelling and are widespread in computer vision. They learn using only pairs of similar objects that are easier to identify in logging data. We are the first to introduce non-contrastive SSL for well-logging data. In particular, we exploit Bootstrap Your Own Latent (BYOL) and Barlow Twins methods that avoid using negative pairs and focus only on matching positive pairs. The crucial part of these methods is an augmentation strategy. Our augmentation strategies and adaption of BYOL and Barlow Twins together allow us to achieve superior quality on clusterization and mostly the best performance on different classification tasks. Our results prove the usefulness of the proposed non-contrastive self-supervised approaches for representation learning and interval similarity in particular.
翻译:油气行业的表示学习问题旨在构建一个基于测井数据提供井段表示的模型。以往的研究主要采用监督学习方式,重点关注相似度任务,通过评估井段之间的接近程度来实现。我们希望在无需监督(标注)数据的情况下构建信息丰富的表示。自监督学习(SSL)是可能的途径之一。与监督范式不同,该范式对数据标签要求极少甚至完全不依赖标签。当前大多数自监督学习方法可分为对比式和非对比式两类。对比方法通过拉近相似(正例)对象的表示并推远不同(负例)对象的表示来实现。由于正负样本对可能存在错误标记,这类方法可能表现不佳。非对比方法不依赖此类标记,在计算机视觉领域广泛应用,其仅通过学习测井数据中易于识别的相似对象对进行训练。我们首次将非对比自监督学习引入测井数据领域。具体而言,我们采用了Bootstrap Your Own Latent(BYOL)和Barlow Twins方法,这两种方法均避免使用负样本对,专注于匹配正样本对。这些方法的关键在于数据增强策略。我们设计的增强策略以及针对BYOL和Barlow Twins的适应性改进,使模型在聚类任务中达到卓越质量,并在不同分类任务中实现最优性能。实验结果证明了所提出的非对比自监督方法在表示学习(尤其是井段相似度分析)中的有效性。