AIS data from ships is excellent for analyzing single-ship movements and monitoring all ships within a specific area. However, the AIS data needs to be cleaned, processed, and stored before being usable. This paper presents a system consisting of an efficient and modular ETL process for loading AIS data, as well as a distributed spatial data warehouse storing the trajectories of ships. To efficiently analyze a large set of ships, a raster approach to querying the AIS data is proposed. A spatially partitioned data warehouse with a granularized cell representation and heatmap presentation is designed, developed, and evaluated. Currently the data warehouse stores ~312 million kilometers of ship trajectories and more than +8 billion rows in the largest table. It is found that searching the cell representation is faster than searching the trajectory representation. Further, we show that the spatially divided shards enable a consistently good scale-up for both cell and heatmap analytics in large areas, ranging between 354% to 1164% with a 5x increase in workers
翻译:船舶自动识别系统(AIS)数据非常适用于分析单船运动及监测特定区域内所有船舶的动态。然而,AIS数据在使用前需经过清洗、处理与存储。本文提出一个系统,包含一个用于加载AIS数据的高效模块化ETL流程,以及一个存储船舶轨迹的分布式空间数据仓库。为高效分析大规模船舶数据,提出了一种基于栅格化的AIS数据查询方法。设计、开发并评估了一个采用细粒度单元表示与热力图呈现的空间分区数据仓库。目前,该数据仓库存储了约3.12亿公里的船舶轨迹,其最大表格包含超过80亿行数据。研究发现,基于单元表示的搜索速度优于基于轨迹表示的搜索。此外,我们证明空间分片机制使得单元分析与热力图分析在大范围区域内均能实现持续良好的扩展性,在计算节点数量增加5倍时,性能提升范围介于354%至1164%之间。