FT-AED: Benchmark Dataset for Early Freeway Traffic Anomalous Event Detection

Early and accurate detection of anomalous events on the freeway, such as accidents, can improve emergency response and clearance. However, existing delays and errors in event identification and reporting make it a difficult problem to solve. Current large-scale freeway traffic datasets are not designed for anomaly detection and ignore these challenges. In this paper, we introduce the first large-scale lane-level freeway traffic dataset for anomaly detection. Our dataset consists of a month of weekday radar detection sensor data collected in 4 lanes along an 18-mile stretch of Interstate 24 heading toward Nashville, TN, comprising over 3.7 million sensor measurements. We also collect official crash reports from the Nashville Traffic Management Center and manually label all other potential anomalies in the dataset. To show the potential for our dataset to be used in future machine learning and traffic research, we benchmark numerous deep learning anomaly detection models on our dataset. We find that unsupervised graph neural network autoencoders are a promising solution for this problem and that ignoring spatial relationships leads to decreased performance. We demonstrate that our methods can reduce reporting delays by over 10 minutes on average while detecting 75% of crashes. Our dataset and all preprocessing code needed to get started are publicly released at https://vu.edu/ft-aed/ to facilitate future research.

翻译：在高速公路上及早准确地检测异常事件（如交通事故）能够改善应急响应与处置效率。然而，现有事件识别与报告过程中存在的延迟和误差使得该问题难以解决。当前大规模高速公路交通数据集并非为异常检测设计，且忽视了这些挑战。本文提出了首个用于异常检测的大规模车道级高速公路交通数据集。该数据集包含沿田纳西州纳什维尔方向18英里州际公路24号路段4条车道在工作日一个月内采集的雷达检测传感器数据，涵盖超过370万条传感器测量记录。我们还从纳什维尔交通管理中心收集了官方事故报告，并手动标注了数据集中所有其他潜在异常。为展示本数据集在未来机器学习与交通研究中的应用潜力，我们在数据集上对多种深度学习异常检测模型进行了基准测试。研究发现，无监督图神经网络自编码器是解决该问题的有效方案，而忽略空间关系会导致性能下降。实验表明，我们的方法在检测到75%事故的同时，平均可将报告延迟减少10分钟以上。为促进后续研究，本数据集及全部预处理代码已在 https://vu.edu/ft-aed/ 公开发布。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日