Real-time traffic and sensor data from connected vehicles have the potential to provide insights that will lead to the immediate benefit of efficient management of the transportation infrastructure and related adjacent services. However, the growth of electric vehicles (EVs) and connected vehicles (CVs) has generated an abundance of CV data and sensor data that has put a strain on the processing capabilities of existing data center infrastructure. As a result, the benefits are either delayed or not fully realized. To address this issue, we propose a solution for processing state-wide CV traffic and sensor data on GPUs that provides real-time micro-scale insights in both temporal and spatial dimensions. This is achieved through the use of the Nvidia Rapids framework and the Dask parallel cluster in Python. Our findings demonstrate a 70x acceleration in the extraction, transformation, and loading (ETL) of CV data for the State of Missouri for a full day of all unique CV journeys, reducing the processing time from approximately 48 hours to just 25 minutes. Given that these results are for thousands of CVs and several thousands of individual journeys with sub-second sensor data, implies that we can model and obtain actionable insights for the management of the transportation infrastructure.
翻译:网联车辆产生的实时交通与传感器数据蕴含巨大潜力,可为交通基础设施高效管理及相关配套服务提供即时决策支持。然而,随着电动汽车与网联车辆数量的激增,海量的网联车辆数据和传感器数据已对现有数据中心基础设施的处理能力造成压力,导致该数据价值无法及时或充分释放。针对此问题,我们提出一种基于GPU处理全州网联车辆交通与传感器数据的解决方案,可在时空维度提供实时微观尺度洞察。该方案通过Nvidia Rapids框架与Python Dask并行集群实现。实验结果表明,针对密苏里州全天所有独立网联车辆行程数据,本方案实现了提取-转换-加载(ETL)流程70倍的加速,将处理时间从约48小时缩短至仅25分钟。鉴于实验数据涵盖数千辆网联车辆及数万次亚秒级传感器数据记录的独立行程,这意味着我们能够对交通基础设施管理进行建模,并获取可操作的决策建议。