The increasing use of deep learning techniques has reduced interpretation time and, ideally, reduced interpreter bias by automatically deriving geological maps from digital outcrop models. However, accurate validation of these automated mapping approaches is a significant challenge due to the subjective nature of geological mapping and the difficulty in collecting quantitative validation data. Additionally, many state-of-the-art deep learning methods are limited to 2D image data, which is insufficient for 3D digital outcrops, such as hyperclouds. To address these challenges, we present Tinto, a multi-sensor benchmark digital outcrop dataset designed to facilitate the development and validation of deep learning approaches for geological mapping, especially for non-structured 3D data like point clouds. Tinto comprises two complementary sets: 1) a real digital outcrop model from Corta Atalaya (Spain), with spectral attributes and ground-truth data, and 2) a synthetic twin that uses latent features in the original datasets to reconstruct realistic spectral data (including sensor noise and processing artifacts) from the ground-truth. The point cloud is dense and contains 3,242,964 labeled points. We used these datasets to explore the abilities of different deep learning approaches for automated geological mapping. By making Tinto publicly available, we hope to foster the development and adaptation of new deep learning tools for 3D applications in Earth sciences. The dataset can be accessed through this link: https://doi.org/10.14278/rodare.2256.
翻译:深度学习技术的日益普及缩短了解译时间,并通过从数字露头模型自动生成地质图,在理想情况下减少了解译偏差。然而,由于地质制图的主观性以及定量验证数据收集的困难,这些自动化制图方法的准确验证面临重大挑战。此外,许多最先进的深度学习方法仅限于二维图像数据,这不足以处理三维数字露头(如超点云)。为应对这些挑战,我们提出了Tinto——一个多传感器基准数字露头数据集,旨在促进地质制图深度学习方法的开发与验证,尤其针对无结构三维数据(如点云)。Tinto包含两个互补子集:1)来自西班牙科塔阿塔拉亚的真实数字露头模型,包含光谱属性与真实标注数据;2)利用原始数据集中的潜在特征从真实标注重建真实光谱数据(包括传感器噪声和处理伪影)的合成孪生数据集。该点云密集且包含3,242,964个标注点。我们利用这些数据集探索了不同深度学习方法在自动化地质制图中的能力。通过公开Tinto数据集,我们希望推动地球科学领域三维应用的新型深度学习工具的开发与适配。该数据集可通过以下链接访问:https://doi.org/10.14278/rodare.2256。