Intelligent Traffic Monitoring (ITMo) technologies hold the potential for improving road safety/security and for enabling smart city infrastructure. Understanding traffic situations requires a complex fusion of perceptual information with domain-specific and causal commonsense knowledge. Whereas prior work has provided benchmarks and methods for traffic monitoring, it remains unclear whether models can effectively align these information sources and reason in novel scenarios. To address this assessment gap, we devise three novel text-based tasks for situational reasoning in the traffic domain: i) BDD-QA, which evaluates the ability of Language Models (LMs) to perform situational decision-making, ii) TV-QA, which assesses LMs' abilities to reason about complex event causality, and iii) HDT-QA, which evaluates the ability of models to solve human driving exams. We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work, based on natural language inference, commonsense knowledge-graph self-supervision, multi-QA joint training, and dense retrieval of domain information. We associate each method with a relevant knowledge source, including knowledge graphs, relevant benchmarks, and driving manuals. In extensive experiments, we benchmark various knowledge-aware methods against the three datasets, under zero-shot evaluation; we provide in-depth analyses of model performance on data partitions and examine model predictions categorically, to yield useful insights on traffic understanding, given different background knowledge and reasoning strategies.
翻译:智能交通监控(ITMo)技术有望提升道路安全/安保水平,并为智慧城市基础设施提供支撑。交通情境的理解需要将感知信息与领域特定知识及因果常识进行复杂融合。尽管已有研究提供了交通监控的基准测试与相关方法,但模型能否有效对齐这些信息源并在新场景中进行推理仍不明确。为弥补这一评估缺口,我们设计了三种面向交通领域情境推理的新型文本任务:i) BDD-QA,评估语言模型(LMs)进行情境决策的能力;ii) TV-QA,评估语言模型推理复杂事件因果关系的能力;iii) HDT-QA,评估模型解决人类驾驶考试的能力。我们采用了四种基于知识增强的方法——这些方法在先前工作中的语言推理任务中已展现出泛化能力,具体包括自然语言推理、常识知识图谱自监督、多问答联合训练及领域信息密集检索。我们将各方法与相应的知识源关联,涵盖知识图谱、相关基准测试及驾驶手册。通过大量实验,我们在零样本评估条件下对多种知识感知方法在三个数据集上的性能进行了基准测试;针对数据分区开展了模型性能的深度分析,并按类别检查了模型预测结果,从而在不同背景知识与推理策略下,为交通理解提供了富有价值的见解。