WOMD-Reasoning: A Large-Scale Language Dataset for Interaction and Driving Intentions Reasoning

We propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a language annotation dataset built on WOMD, with a focus on describing and reasoning interactions and intentions in driving scenarios. Previous language datasets primarily captured interactions caused by close distances. However, interactions induced by traffic rules and human intentions, which can occur over long distances, are yet sufficiently covered, despite being very common and more challenging for prediction or planning models to understand. Therefore, our WOMD-Reasoning focuses extensively on these interactions, providing a total of 409k Q&As for varying types of interactions. Additionally, WOMD-Reasoning presents by far the largest Q&A dataset on real-world driving scenarios, with around 3 million Q&As covering various topics of autonomous driving from map descriptions, motion status descriptions, to narratives and analyses of agents' interactions, behaviors, and intentions. This extensive textual information enables fine-tuning driving-related Large Language Models (LLMs) for a wide range of applications like scene description, prediction, planning, etc. By incorporating interaction and intention language from WOMD-Reasoning, we see significant enhancements in the performance of the state-of-the-art trajectory prediction model, Multipath++, with improvements of 10.14% in $MR_6$ and 6.90% in $minFDE_6$, proving the effectiveness of WOMD-Reasoning. We hope WOMD-Reasoning would empower LLMs in driving to offer better interaction understanding and behavioral reasoning. The dataset is available on https://waymo.com/open/download .

翻译：我们提出了Waymo开放运动数据集-推理（WOMD-Reasoning），这是一个基于WOMD构建的语言标注数据集，专注于描述和推理驾驶场景中的交互与意图。以往的语言数据集主要捕捉由近距离引发的交互。然而，由交通规则和人类意图引发的交互——这类交互可能发生在长距离范围内——尽管非常普遍且对预测或规划模型的理解更具挑战性，却尚未得到充分覆盖。因此，我们的WOMD-Reasoning广泛关注这些交互，为不同类型的交互提供了总计409k的问答对。此外，WOMD-Reasoning是目前为止关于真实世界驾驶场景的最大问答数据集，包含约300万问答对，涵盖了从地图描述、运动状态描述，到对智能体交互、行为及意图的叙述与分析等自动驾驶的各类主题。这些丰富的文本信息使得能够针对场景描述、预测、规划等多种应用，对驾驶相关的大语言模型（LLMs）进行微调。通过融入WOMD-Reasoning中的交互与意图语言，我们发现最先进的轨迹预测模型Multipath++的性能得到显著提升，$MR_6$提高了10.14%，$minFDE_6$提高了6.90%，这证明了WOMD-Reasoning的有效性。我们希望WOMD-Reasoning能够赋能驾驶领域的LLMs，以提供更好的交互理解和行为推理能力。该数据集可在https://waymo.com/open/download 获取。