Road network data provides rich information about cities, but processing worldwide OpenStreetMap (OSM) data is computationally intensive, and the resulting graphs are often difficult to unify for benchmarking downstream tasks. Existing graph learning benchmarks fail to capture the billion-scale and unique topological properties of real-world road networks, leaving model scalability underexplored. To close this gap, we process OSM data with distributed cloud computing using 5,000 cores and release \textbf{OSM+}, a structured worldwide 1-billion-vertex road network graph dataset designed for high accessibility and usability. OSM+ is open source and globally downloadable, providing an open-box graph structure and an easy spatial query interface; the evaluated release is a fixed snapshot for reproducibility, with a versioned update plan for future releases. We demonstrate the utility of OSM+ through three illustrative use cases: city boundary detection, traffic prediction, and traffic policy control. For traffic prediction, we construct a new 31-city benchmark by processing traffic data and combining it with OSM+, enabling broader spatial coverage and more comprehensive evaluation than commonly used datasets, while scaling from hundreds of road network intersections to thousands. For traffic policy control, we release a new six-city dataset at a much larger scale, introducing challenges for thousand-scale multi-agent coordination. We also provide data processing tools for integrating multimodal spatial-temporal data with OSM+ for geospatial foundation model training, thereby expediting the discovery of compelling scientific insights.
翻译:道路网络数据提供了丰富的城市信息,但处理全球范围的OpenStreetMap(OSM)数据计算量巨大,且由此生成的图结构通常难以统一用于下游任务的基准测试。现有的图学习基准未能捕捉到真实道路网络的十亿级规模和独特拓扑特性,导致模型的可扩展性研究不足。为填补这一空白,我们利用分布式云计算,使用5000个核心处理OSM数据,并发布**OSM+**——一个结构化的全球十亿顶点道路网络图数据集,旨在实现高可访问性和易用性。OSM+是开源且可全球下载的,提供黑盒图结构和便捷的空间查询接口;评估版本为固定快照以确保可复现性,并计划通过版本化更新进行后续发布。我们通过三个示例用例展示了OSM+的实用性:城市边界检测、交通预测和交通策略控制。在交通预测方面,我们通过处理交通数据并将其与OSM+结合,构建了一个包含31个城市的新基准,与常用数据集相比,该基准在空间覆盖范围和评估全面性上更广,且规模从数百个道路网络交叉口扩展到数千个。在交通策略控制方面,我们发布了一个规模更大的新六城数据集,为千级多智能体协调引入了挑战。我们还提供了数据处理工具,用于将多模态时空数据与OSM+集成,以训练地理空间基础模型,从而加速发现有价值的科学洞见。