Intelligent Traffic Monitoring (ITMo) technologies hold the potential for improving road safety/security and for enabling smart city infrastructure. Understanding traffic situations requires a complex fusion of perceptual information with domain-specific and causal commonsense knowledge. Whereas prior work has provided benchmarks and methods for traffic monitoring, it remains unclear whether models can effectively align these information sources and reason in novel scenarios. To address this assessment gap, we devise three novel text-based tasks for situational reasoning in the traffic domain: i) BDD-QA, which evaluates the ability of Language Models (LMs) to perform situational decision-making, ii) TV-QA, which assesses LMs' abilities to reason about complex event causality, and iii) HDT-QA, which evaluates the ability of models to solve human driving exams. We adopt four knowledge-enhanced methods that have shown generalization capability across language reasoning tasks in prior work, based on natural language inference, commonsense knowledge-graph self-supervision, multi-QA joint training, and dense retrieval of domain information. We associate each method with a relevant knowledge source, including knowledge graphs, relevant benchmarks, and driving manuals. In extensive experiments, we benchmark various knowledge-aware methods against the three datasets, under zero-shot evaluation; we provide in-depth analyses of model performance on data partitions and examine model predictions categorically, to yield useful insights on traffic understanding, given different background knowledge and reasoning strategies.
翻译:智能交通监控(Intelligent Traffic Monitoring, ITMo)技术具有提升道路安全/保障及赋能智慧城市基础设施的潜力。理解交通场景需要将感知信息与领域特定知识及因果常识知识进行复杂融合。尽管已有研究提供了交通监控的基准和方法,但模型能否有效对齐这些信息源并在新场景中进行推理仍不明确。为填补这一评估空白,我们设计了三项针对交通领域场景推理的新型文本任务:i) BDD-QA,评估语言模型(Language Models, LMs)进行场景决策的能力;ii) TV-QA,评估语言模型对复杂事件因果关系的推理能力;iii) HDT-QA,评估模型解答人类驾驶考试的能力。我们采用了四种知识增强方法,这些方法在先前工作的语言推理任务中已展现出泛化能力,分别基于自然语言推理、常识知识图谱自监督、多问答联合训练及领域信息稠密检索。我们将每种方法与相关知识源关联,包括知识图谱、相关基准及驾驶手册。在大量实验中,我们在零样本评估下将多种知识感知方法对这三个数据集进行基准测试;通过对数据分区的模型性能进行深入分析,并按类别检验模型预测,我们获得了关于不同背景知识与推理策略下交通理解的有用洞见。