Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos

Traffic anomaly detection (TAD) in driving videos is critical for ensuring the safety of autonomous driving and advanced driver assistance systems. Previous single-stage TAD methods primarily rely on frame prediction, making them vulnerable to interference from dynamic backgrounds induced by the rapid movement of the dashboard camera. While two-stage TAD methods appear to be a natural solution to mitigate such interference by pre-extracting background-independent features (such as bounding boxes and optical flow) using perceptual algorithms, they are susceptible to the performance of first-stage perceptual algorithms and may result in error propagation. In this paper, we introduce TTHF, a novel single-stage method aligning video clips with text prompts, offering a new perspective on traffic anomaly detection. Unlike previous approaches, the supervised signal of our method is derived from languages rather than orthogonal one-hot vectors, providing a more comprehensive representation. Further, concerning visual representation, we propose to model the high frequency of driving videos in the temporal domain. This modeling captures the dynamic changes of driving scenes, enhances the perception of driving behavior, and significantly improves the detection of traffic anomalies. In addition, to better perceive various types of traffic anomalies, we carefully design an attentive anomaly focusing mechanism that visually and linguistically guides the model to adaptively focus on the visual context of interest, thereby facilitating the detection of traffic anomalies. It is shown that our proposed TTHF achieves promising performance, outperforming state-of-the-art competitors by +5.4% AUC on the DoTA dataset and achieving high generalization on the DADA dataset.

翻译：行车视频中的交通异常检测（TAD）对于保障自动驾驶和高级驾驶辅助系统的安全性至关重要。以往的基于单阶段TAD方法主要依赖帧预测，易受车载摄像头快速移动引起的动态背景干扰。虽然两阶段TAD方法通过利用感知算法预提取与背景无关的特征（如边界框和光流）来缓解此类干扰，但其性能受限于第一阶段感知算法的效果，且可能导致误差传播。本文介绍了TTHF——一种将视频片段与文本提示对齐的新型单阶段方法，为交通异常检测提供了新视角。与先前方法不同，本文方法的监督信号源于语言而非正交独热向量，从而提供更全面的表征。针对视觉表征，我们提出在时间域内对行车视频的高频成分进行建模。该建模捕捉了驾驶场景的动态变化，增强了对驾驶行为的感知，并显著提升了交通异常检测性能。此外，为更好感知各类交通异常，我们精心设计了注意力异常聚焦机制，通过视觉和语言引导模型自适应聚焦于感兴趣的视觉上下文，从而促进交通异常检测。实验表明，所提出的TTHF方法取得了优异性能：在DoTA数据集上，比当前最优方法AUC提升5.4%，且在DADA数据集上展现出高泛化能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日