Text-Driven Traffic Anomaly Detection with Temporal High-Frequency Modeling in Driving Videos

Traffic anomaly detection (TAD) in driving videos is critical for ensuring the safety of autonomous driving and advanced driver assistance systems. Previous single-stage TAD methods primarily rely on frame prediction, making them vulnerable to interference from dynamic backgrounds induced by the rapid movement of the dashboard camera. While two-stage TAD methods appear to be a natural solution to mitigate such interference by pre-extracting background-independent features (such as bounding boxes and optical flow) using perceptual algorithms, they are susceptible to the performance of first-stage perceptual algorithms and may result in error propagation. In this paper, we introduce TTHF, a novel single-stage method aligning video clips with text prompts, offering a new perspective on traffic anomaly detection. Unlike previous approaches, the supervised signal of our method is derived from languages rather than orthogonal one-hot vectors, providing a more comprehensive representation. Further, concerning visual representation, we propose to model the high frequency of driving videos in the temporal domain. This modeling captures the dynamic changes of driving scenes, enhances the perception of driving behavior, and significantly improves the detection of traffic anomalies. In addition, to better perceive various types of traffic anomalies, we carefully design an attentive anomaly focusing mechanism that visually and linguistically guides the model to adaptively focus on the visual context of interest, thereby facilitating the detection of traffic anomalies. It is shown that our proposed TTHF achieves promising performance, outperforming state-of-the-art competitors by +5.4% AUC on the DoTA dataset and achieving high generalization on the DADA dataset.

翻译：交通异常检测(TAD)在驾驶视频中对于确保自动驾驶及高级驾驶辅助系统的安全性至关重要。以往的单阶段TAD方法主要依赖帧预测，导致其易受车载摄像头快速运动引起的动态背景干扰。尽管两阶段TAD方法通过使用感知算法预提取与背景无关的特征（如边界框和光流）来缓解此类干扰，但其性能受限于第一阶段感知算法的表现，并可能导致误差传播。本文提出TTHF——一种将视频片段与文本提示对齐的新型单阶段方法，为交通异常检测提供了新视角。与以往方法不同，本方法的监督信号源自语言而非正交独热向量，可实现更全面的表征。在视觉表征方面，我们提出对驾驶视频的时域高频成分进行建模。该建模可捕捉驾驶场景的动态变化，增强对驾驶行为的感知能力，并显著提升交通异常检测效果。此外，为更好感知各类交通异常，我们精心设计了注意力异常聚焦机制，从视觉和语言层面引导模型自适应关注感兴趣的视觉上下文，从而促进交通异常检测。实验表明，所提出的TTHF方法在DoTA数据集上的AUC指标超越现有最优方法5.4%，并在DADA数据集上展现出优异的泛化性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日