Intelligent vehicle systems require a deep understanding of the interplay between road conditions, surrounding entities, and the ego vehicle's driving behavior for safe and efficient navigation. This is particularly critical in developing countries where traffic situations are often dense and unstructured with heterogeneous road occupants. Existing datasets, predominantly geared towards structured and sparse traffic scenarios, fall short of capturing the complexity of driving in such environments. To fill this gap, we present IDD-X, a large-scale dual-view driving video dataset. With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects covering 10 categories and 19 explanation label categories. The dataset also incorporates rearview information to provide a more complete representation of the driving environment. We also introduce custom-designed deep networks aimed at multiple important object localization and per-object explanation prediction. Overall, our dataset and introduced prediction models form the foundation for studying how road conditions and surrounding entities affect driving behavior in complex traffic situations.
翻译:智能车辆系统需要深入理解道路状况、周围实体与自车驾驶行为之间的相互作用,以实现安全高效的导航。这一需求在发展中国家尤为关键,因其交通场景往往密集且非结构化,道路参与者呈现异质性。现有数据集主要针对结构化稀疏交通场景设计,难以捕捉此类驾驶环境的复杂性。为填补这一空白,我们提出IDD-X——大规模双视角驾驶视频数据集。该数据集包含697K个边界框、9K个重要目标轨迹,每视频含1-12个目标,提供覆盖10个类别和19种解释标签类别的多个重要道路目标的综合自车相对标注。数据集还融合后视信息以更完整表征驾驶环境。我们同时引入专为多重要目标定位与逐目标解释预测设计的定制深度网络。总体而言,本数据集及所提出的预测模型为研究复杂交通场景中道路状况与周围实体对驾驶行为的影响奠定了基础。