Subsurface datasets inherently possess big data characteristics such as vast volume, diverse features, and high sampling speeds, further compounded by the curse of dimensionality from various physical, engineering, and geological inputs. Among the existing dimensionality reduction (DR) methods, nonlinear dimensionality reduction (NDR) methods, especially Metric-multidimensional scaling (MDS), are preferred for subsurface datasets due to their inherent complexity. While MDS retains intrinsic data structure and quantifies uncertainty, its limitations include unstabilized unique solutions invariant to Euclidean transformations and an absence of out-of-sample points (OOSP) extension. To enhance subsurface inferential and machine learning workflows, datasets must be transformed into stable, reduced-dimension representations that accommodate OOSP. Our solution employs rigid transformations for a stabilized Euclidean invariant representation for LDS. By computing an MDS input dissimilarity matrix, and applying rigid transformations on multiple realizations, we ensure transformation invariance and integrate OOSP. This process leverages a convex hull algorithm and incorporates loss function and normalized stress for distortion quantification. We validate our approach with synthetic data, varying distance metrics, and real-world wells from the Duvernay Formation. Results confirm our method's efficacy in achieving consistent LDS representations. Furthermore, our proposed "stress ratio" (SR) metric provides insight into uncertainty, beneficial for model adjustments and inferential analysis. Consequently, our workflow promises enhanced repeatability and comparability in NDR for subsurface energy resource engineering and associated big data workflows.
翻译:地下数据集固有地具有大数据特征,如海量数据量、多样化特征和高采样速度,加之来自各种物理、工程和地质输入的维度诅咒,进一步加剧了问题。在现有降维方法中,非线性降维方法,尤其是度量多维缩放,因其处理固有复杂性的能力而被优先用于地下数据集。虽然度量多维缩放保留了内在数据结构并量化了不确定性,但其局限性包括:对欧几里得变换不变的非稳定唯一解,以及缺乏样本外点扩展。为增强地下推理和机器学习工作流程,数据集必须转化为稳定的低维表示,并容纳样本外点。我们的解决方案采用刚性变换,为低维空间获得稳定的欧几里得不变表示。通过计算度量多维缩放输入相异矩阵,并对多个实现应用刚性变换,我们确保了变换不变性并整合了样本外点。该过程利用凸包算法,并结合损失函数与归一化应力进行失真量化。我们使用合成数据、不同距离度量以及来自杜弗内层的真实井数据验证了该方法。结果证实了该方法在实现一致的低维空间表示方面的有效性。此外,我们提出的“应力比”指标提供对不确定性的洞察,有利于模型调整和推理分析。因此,我们的工作流程有望增强地下能源资源工程及关联大数据工作流程中非线性降维的可重复性和可比较性。