Modern in-orbit satellites and other available remote sensing tools have generated a huge availability of public data waiting to be exploited in different formats hosted on different servers. In this context, ETL formalism becomes relevant for the integration and analysis of the combined information from all these sources. Throughout this work, we present the theoretical and practical foundations to build a modular analysis infrastructure that allows the creation of ETLs to download, transform and integrate data coming from different instruments in different formats. Part of this work is already implemented in a Python library which is intended to be integrated into already available workflow management tools based on acyclic-directed graphs which also have different adapters to impact the combined data in different warehouses.
翻译:现代在轨卫星及其他可用遥感工具已产生大量可供利用的公共数据,这些数据以不同格式存储于不同服务器上。在此背景下,ETL形式化方法对于整合与分析来自所有这些来源的综合信息具有重要意义。本文系统阐述了构建模块化分析基础设施的理论与实践基础,该设施支持创建ETL流程以从不同仪器下载、转换和集成不同格式的数据。该工作的部分内容已实现为Python库,旨在集成至基于有向无环图的现有工作流管理工具中,并配备多种适配器以将整合后的数据导入不同数据仓库。