Modeling traffic dynamics is a critical challenge for urban computing, with applications from real-time traffic management to infrastructure planning. However, progress in this area is fundamentally constrained by a lack of large-scale public datasets that capture the subtle properties of real city road networks. Existing benchmarks are often limited by their small scale, reliance on sparse highway traffic sensors, absence of true road connectivity information, and lack of information about road properties. To address this issue, we introduce datasets representing fine-grained road networks of two major cities, which are unique in their scale (up to 100,000 road segments), use of real road connectivity, presence of time series measurements for both traffic speed and volume at a 5-minute resolution, and inclusion of rich static road attributes. These datasets enable in-depth analysis of spatiotemporal traffic patterns and can serve as benchmarks for various ML applications. As a practical demonstration of the utility of our datasets and the challenges they present, we use them for the task of traffic forecasting. The size of the real-world road networks in our datasets reveals significant scalability issues in current traffic forecasting models. To address them, we propose a simple and efficient baseline that not only scales to large road graphs but also achieves forecasting performance competitive with other established spatiotemporal models. We hope that the proposed datasets will serve as a foundational resource for a broad range of research in traffic modeling, urban computing, and smart city development.
翻译:交通动态建模是城市计算中的关键挑战,其应用涵盖实时交通管理与基础设施规划。然而,该领域的发展从根本上受限于缺乏能够捕捉真实城市道路网络细微特性的大规模公开数据集。现有基准数据集往往存在规模较小、依赖稀疏高速公路传感器、缺乏真实道路连通性信息以及缺失道路属性数据等问题。为解决此问题,我们提出了两个主要城市的精细路网数据集,其独特性体现在:规模庞大(包含多达10万个路段)、采用真实道路连通性、提供5分钟粒度的交通流量与速度时序测量数据、以及包含丰富的静态道路属性。这些数据集不仅支持对时空交通模式的深度分析,还可作为多种机器学习应用的基准测试平台。作为数据集实用性与挑战性验证的示范,我们将其应用于交通预测任务。数据集中真实路网规模揭示了当前交通预测模型存在的显著可扩展性问题。为此,我们提出了一种简单高效的基线模型,该模型不仅能扩展至大规模路网图,其预测性能亦可与现有成熟的时空模型相媲美。我们期待所提出的数据集能够成为交通建模、城市计算与智慧城市发展等领域广泛研究的核心基础资源。