The increasing use of deep learning techniques has reduced interpretation time and, ideally, reduced interpreter bias by automatically deriving geological maps from digital outcrop models. However, accurate validation of these automated mapping approaches is a significant challenge due to the subjective nature of geological mapping and the difficulty in collecting quantitative validation data. Additionally, many state-of-the-art deep learning methods are limited to 2D image data, which is insufficient for 3D digital outcrops, such as hyperclouds. To address these challenges, we present Tinto, a multi-sensor benchmark digital outcrop dataset designed to facilitate the development and validation of deep learning approaches for geological mapping, especially for non-structured 3D data like point clouds. Tinto comprises two complementary sets: 1) a real digital outcrop model from Corta Atalaya (Spain), with spectral attributes and ground-truth data, and 2) a synthetic twin that uses latent features in the original datasets to reconstruct realistic spectral data (including sensor noise and processing artifacts) from the ground-truth. The point cloud is dense and contains 3,242,964 labeled points. We used these datasets to explore the abilities of different deep learning approaches for automated geological mapping. By making Tinto publicly available, we hope to foster the development and adaptation of new deep learning tools for 3D applications in Earth sciences. The dataset can be accessed through this link: https://doi.org/10.14278/rodare.2256.
翻译:深度学习技术的日益普及通过从数字露头模型中自动生成地质图,缩短了解译时间,并在理想情况下减少了解译者的主观偏差。然而,由于地质制图固有的主观性以及定量验证数据收集的困难,这些自动化制图方法的精确验证仍是重大挑战。此外,许多最先进的深度学习方法局限于二维图像数据,这不足以处理三维数字露头(如超点云)。为应对这些挑战,我们提出了Tinto——一个多传感器基准数字露头数据集,旨在促进地质制图(尤其是面向点云等非结构化三维数据)深度学习方法的开发与验证。Tinto包含两个互补子集:1)来自西班牙Corta Atalaya的真实数字露头模型,包含光谱属性和真实标注数据;2)利用原始数据集中的潜在特征重建真实光谱数据(含传感器噪声和处理伪影)的合成孪生模型。点云密集,包含3,242,964个已标注点。我们利用这些数据集探索了不同深度学习方法在自动化地质制图中的能力。通过公开Tinto数据集,我们期望推动地球科学三维应用领域的新深度学习工具开发与适配。数据集可通过以下链接获取:https://doi.org/10.14278/rodare.2256。