Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies

Idling vehicle detection (IVD) can be helpful in monitoring and reducing unnecessary idling and can be integrated into real-time systems to address the resulting pollution and harmful products. The previous approach [13], a non-end-to-end model, requires extra user clicks to specify a part of the input, making system deployment more error-prone or even not feasible. In contrast, we introduce an end-to-end joint audio-visual IVD task designed to detect vehicles visually under three states: moving, idling and engine off. Unlike feature co-occurrence task such as audio-visual vehicle tracking, our IVD task addresses complementary features, where labels cannot be determined by a single modality alone. To this end, we propose AVIVD-Net, a novel network that integrates audio and visual features through a bidirectional attention mechanism. AVIVD-Net streamlines the input process by learning a joint feature space, reducing the deployment complexity of previous methods. Additionally, we introduce the AVIVD dataset, which is seven times larger than previous datasets, offering significantly more annotated samples to study the IVD problem. Our model achieves performance comparable to prior approaches, making it suitable for automated deployment. Furthermore, by evaluating AVIVDNet on the feature co-occurrence public dataset MAVD [23], we demonstrate its potential for extension to self-driving vehicle video-camera setups.

翻译：怠速车辆检测（IVD）有助于监测并减少不必要的怠速现象，并可集成至实时系统中以应对由此产生的污染及有害产物。先前方法[13]作为一种非端到端模型，需要额外的用户点击来指定部分输入，导致系统部署更易出错甚至不可行。相比之下，我们提出一种端到端的联合视听IVD任务，旨在通过视觉检测车辆的三种状态：行驶、怠速及熄火。与视听车辆跟踪等特征共现任务不同，我们的IVD任务处理互补性特征——其标签无法仅通过单一模态确定。为此，我们提出AVIVD-Net，这是一种通过双向注意力机制融合视听特征的新型网络。AVIVD-Net通过学习联合特征空间简化了输入流程，降低了先前方法的部署复杂度。此外，我们构建了AVIVD数据集，其规模为现有数据集的七倍，提供了更丰富的标注样本来研究IVD问题。我们的模型取得了与现有方法相当的性能，适用于自动化部署。进一步地，通过在特征共现公开数据集MAVD[23]上评估AVIVD-Net，我们证明了其扩展至自动驾驶车辆摄像系统的潜力。

相关内容

Institute of Deep Learning

关注 22

2013年1月19日，百度CEO李彦宏在2012年年会上提出，2013年百度将建立初期专注于Deep Learning(深度学习)的研究院，并命名为Institute of Deep Learning(简称IDL)。百度是中国互联网企业中第一个把Deep Learning提到核心技术创新地位的企业。因为Deep Learning不仅依赖于云计算对大数据的并行处理能力，作为解决数据“抽象概念”问题的算法，它同样需要有足够学术意义和产业价值的研究方向作为依托。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日