The calculation of shortest-path distances in road networks is a core operation in navigation systems, location-based services, and spatial analytics. Although classical algorithms, e.g., Dijkstra's algorithm, provide exact answers, their latency is prohibitive for modern real-time, large-scale deployments. Over the past two decades, numerous distance indexes have been proposed to speed up query processing for shortest distance queries. More recently, with the advancement in machine learning (ML), researchers have designed and proposed ML-based distance indexes to answer approximate shortest path and distance queries efficiently. However, a comprehensive and systematic evaluation of these ML-based approaches is lacking. This paper presents the first empirical survey of ML-based distance indexes on road networks, evaluating them along four key dimensions: Training time, query latency, storage, and accuracy. Using seven real-world road networks and workload-driven query datasets derived from trajectory data, we benchmark ten representative ML techniques and compare them against strong classical non-ML baselines, highlighting key insights and practical trade-offs. We release a unified open-source codebase to support reproducibility and future research on learned distance indexes.
翻译:道路网络中最短路径距离的计算是导航系统、基于位置的服务和空间分析中的核心操作。尽管经典算法(例如Dijkstra算法)能提供精确答案,但其延迟对于现代实时大规模部署而言过高。过去二十年间,已有大量距离索引被提出以加速最短距离查询的处理。近年来,随着机器学习(ML)的进步,研究者设计并提出了基于ML的距离索引,以高效回答近似最短路径和距离查询。然而,目前缺乏对这些基于ML方法的全面系统评估。本文首次对道路网络上基于ML的距离索引进行实证调查,从四个关键维度评估它们:训练时间、查询延迟、存储开销和准确性。利用七个真实世界道路网络以及从轨迹数据衍生的负载驱动查询数据集,我们对十种代表性ML技术进行基准测试,并将其与强力的经典非ML基线方法进行比较,从而揭示关键见解与实践权衡。我们发布了统一的开源代码库,以支持学习距离索引的可复现性与未来研究。