This work addresses the task of modeling spatiotemporal traffic patterns directly from overhead imagery, which we refer to as image-driven traffic modeling. We extend this line of work and introduce a multi-modal, multi-task transformer-based segmentation architecture that can be used to create dense city-scale traffic models. Our approach includes a geo-temporal positional encoding module for integrating geo-temporal context and a probabilistic objective function for estimating traffic speeds that naturally models temporal variations. We evaluate our method extensively using the Dynamic Traffic Speeds (DTS) benchmark dataset and significantly improve the state-of-the-art. Finally, we introduce the DTS++ dataset to support mobility-related location adaptation experiments.
翻译:本文研究直接从航拍影像建模时空交通模式的任务,我们称之为图像驱动交通建模。我们对此研究方向进行了拓展,提出一种基于Transformer的多模态多任务分割架构,用于构建城市尺度的密集交通模型。该架构包含地理-时间位置编码模块,用于整合地理时空上下文信息,并引入概率目标函数来估计交通速度,从而自然地对时间变化进行建模。我们利用动态交通速度(DTS)基准数据集对方法进行了充分评估,显著提升了当前最优性能。最后,我们发布了DTS++数据集以支持移动性相关的区位适应实验。