Monitoring volcanic activity is of paramount importance to safeguarding lives, infrastructure, and ecosystems. However, only a small fraction of known volcanoes are continuously monitored. Satellite-based Interferometric Synthetic Aperture Radar (InSAR) enables systematic, global-scale deformation monitoring. However, its complex data challenge traditional remote sensing methods. Deep learning offers a powerful means to automate and enhance InSAR interpretation, advancing volcanology and geohazard assessment. Despite its promise, progress has been limited by the scarcity of well-curated datasets. In this work, we build on the existing Hephaestus dataset and introduce Thalia, addressing crucial limitations and enriching its scope with higher-resolution, multi-source, and multi-temporal data. Thalia is a global collection of 38 spatiotemporal datacubes covering 7 years and integrating InSAR products, topographic data, as well as atmospheric variables, known to introduce signal delays that can mimic ground deformation in InSAR imagery. Each sample includes expert annotations detailing the type, intensity, and extent of deformation, ac- companied by descriptive text. To enable fair and consistent evaluation, we provide a comprehensive benchmark using state-of-the-art models for classification and segmentation. This work fosters collaboration between machine learning and Earth science, advancing volcanic monitoring and promoting data-driven approaches in geoscience. The code and latest version of the dataset are available through the github repository: https://github.com/Orion-AI-Lab/Thalia
翻译:火山活动监测对于保护生命、基础设施和生态系统至关重要。然而,目前仅有少量已知火山得到持续监测。基于卫星的干涉合成孔径雷达(InSAR)技术能够实现系统性、全球尺度的形变监测,但其复杂的数据特性对传统遥感方法构成了挑战。深度学习为InSAR数据解释的自动化与性能提升提供了强大手段,从而推动火山学与地质灾害评估的发展。尽管前景广阔,但进展一直受限于高质量标注数据集的稀缺。本研究基于现有Hephaestus数据集,引入塔利亚(Thalia)数据集,以解决关键局限性,并通过更高分辨率、多源、多时相数据扩展其覆盖范围。塔利亚是一个全球性时空数据立方体集合,涵盖7年时间跨度,集成了InSAR产品、地形数据以及已知会在InSAR影像中产生类似地表形变信号延迟的大气变量。每个样本均包含专家标注,详细描述形变的类型、强度与范围,并附有描述性文本。为支持公平一致的评估,我们使用最先进的分类与分割模型提供了全面基准测试。这项工作促进了机器学习与地球科学领域的交叉合作,推动了火山监测技术的发展,并倡导地球科学领域的数据驱动研究方法。代码与数据集最新版本可通过GitHub仓库获取:https://github.com/Orion-AI-Lab/Thalia