The availability of the Global Positioning System (GPS) trajectory data is increasing along with the availability of different GPS receivers and with the increasing use of various mobility services. GPS trajectory is an important data source which is used in traffic density detection, transport mode detection, mapping data inferences with the use of different methods such as image processing and machine learning methods. While the data size increases, efficient representation of this type of data is becoming difficult to be used in these methods. A common approach is the representation of GPS trajectory information such as average speed, bearing, etc. in raster image form and applying analysis methods. In this study, we evaluate GPS trajectory data rasterization using the spatial join functions of QGIS, PostGIS+QGIS, and our iterative spatial structured grid aggregation implementation coded in the Python programming language. Our implementation is also parallelizable, and this parallelization is also included as the fourth method. According to the results of experiment carried out with an example GPS trajectory dataset, QGIS method and PostGIS+QGIS method showed relatively low performance with respect to our method using the metric of total processing time. PostGIS+QGIS method achieved the best results for spatial join though its total performance decreased quickly while test area size increases. On the other hand, both of our methods' performances decrease directly proportional to GPS point. And our methods' performance can be increased proportional to the increase with the number of processor cores and/or with multiple computing clusters.
翻译:随着全球定位系统(GPS)接收设备的普及以及各类移动服务的广泛使用,GPS轨迹数据的可获取性日益提升。GPS轨迹作为重要数据源,常被用于交通密度检测、出行方式识别、地图数据推断等任务,并融合图像处理与机器学习等多种方法进行分析。然而,随着数据规模的增长,如何高效表示此类数据以便应用于上述方法变得日益困难。一种常见做法是将平均速度、方位角等GPS轨迹信息以栅格图像形式表示,并在此基础上应用分析方法。本研究评估了利用QGIS空间连接函数、PostGIS+QGIS空间连接函数,以及基于Python编程语言实现的迭代空间结构化网格聚合方法对GPS轨迹数据进行栅格化的性能。我们的实现方案具备可并行化特性,该并行化方案作为第四种方法纳入评估。基于示例GPS轨迹数据集的实验结果表明,以总处理时间为度量标准,QGIS方法与PostGIS+QGIS方法相较我们的方法性能相对较低。尽管PostGIS+QGIS方法在空间连接方面取得最佳结果,但其总体性能随测试区域面积增大而快速下降。相比之下,我们的两种方法的性能下降幅度与GPS点数量保持正比关系,且其性能可随处理器核心数增加或计算集群扩展而按比例提升。