The growing number of moving Internet-of-Things (IoT) devices has led to a surge in moving object data, powering applications such as traffic routing, hotspot detection, or weather forecasting. When managing such data, spatial database systems offer various index options and data formats, e.g., point-based or trajectory-based. Likewise, dataset characteristics such as geographic overlap and skew can vary significantly. All three significantly affect database performance. While this has been studied in existing papers, none of them explore the effects and trade-offs resulting from a combination of all three aspects. In this paper, we evaluate the performance impact of index choice, data format, and dataset characteristics on a popular spatial database system, PostGIS. We focus on two aspects of dataset characteristics, the degree of overlap and the degree of skew, and propose novel approximation methods to determine these features. We design a benchmark that compares a variety of spatial indexing strategies and data formats, while also considering the impact of dataset characteristics on database performance. We include a variety of real-world and synthetic datasets, write operations, and read queries to cover a broad range of scenarios that might occur during application runtime. Our results offer practical guidance for developers looking to optimize spatial storage and querying, while also providing insights into dataset characteristics and their impact on database performance.
翻译:随着移动物联网(IoT)设备数量的不断增长,移动对象数据量激增,推动了交通路径规划、热点检测和天气预报等应用的发展。在管理此类数据时,空间数据库系统提供了多种索引选项和数据格式,例如基于点或基于轨迹的格式。同样,数据集特征(如地理重叠度和偏斜度)也可能存在显著差异。这三个因素都会显著影响数据库性能。尽管现有文献对此已有研究,但尚未有工作全面探讨这三方面因素组合产生的效应与权衡。本文评估了索引选择、数据格式和数据集特征对流行空间数据库系统PostGIS性能的影响。我们重点关注数据集特征的两个方面——重叠程度和偏斜程度,并提出了新颖的近似方法来量化这些特征。我们设计了一套基准测试,比较多种空间索引策略和数据格式,同时考察数据集特征对数据库性能的影响。测试涵盖了多种真实世界与合成数据集、写入操作及读取查询,以全面反映应用运行时可能出现的各类场景。我们的研究结果为寻求优化空间存储与查询的开发者提供了实用指导,同时深入揭示了数据集特征及其对数据库性能的影响机制。