VLM-C4L：基于视觉-语言模型的自动驾驶持续核心数据集学习与极端场景优化 (VLM-C4L: Continual Core Dataset Learning with Corner Case Optimization via Vision-Language Models for Autonomous Driving)

With the widespread adoption and deployment of autonomous driving, handling complex environments has become an unavoidable challenge. Due to the scarcity and diversity of extreme scenario datasets, current autonomous driving models struggle to effectively manage corner cases. This limitation poses a significant safety risk, according to the National Highway Traffic Safety Administration (NHTSA), autonomous vehicle systems have been involved in hundreds of reported crashes annually in the United States, occurred in corner cases like sun glare and fog, which caused a few fatal accident. Furthermore, in order to consistently maintain a robust and reliable autonomous driving system, it is essential for models not only to perform well on routine scenarios but also to adapt to newly emerging scenarios, especially those corner cases that deviate from the norm. This requires a learning mechanism that incrementally integrates new knowledge without degrading previously acquired capabilities. However, to the best of our knowledge, no existing continual learning methods have been proposed to ensure consistent and scalable corner case learning in autonomous driving. To address these limitations, we propose VLM-C4L, a continual learning framework that introduces Vision-Language Models (VLMs) to dynamically optimize and enhance corner case datasets, and VLM-C4L combines VLM-guided high-quality data extraction with a core data replay strategy, enabling the model to incrementally learn from diverse corner cases while preserving performance on previously routine scenarios, thus ensuring long-term stability and adaptability in real-world autonomous driving. We evaluate VLM-C4L on large-scale real-world autonomous driving datasets, including Waymo and the corner case dataset CODA.

翻译：随着自动驾驶技术的广泛应用与部署，处理复杂环境已成为不可避免的挑战。由于极端场景数据集的稀缺性与多样性，当前自动驾驶模型难以有效应对极端场景。这一局限带来了显著的安全风险：根据美国国家公路交通安全管理局（NHTSA）的报告，自动驾驶系统在美国每年涉及数百起上报事故，其中多数发生在如阳光眩光、浓雾等极端场景下，并已导致数起致命事故。此外，为持续保持自动驾驶系统的鲁棒性与可靠性，模型不仅需在常规场景中表现良好，还必须适应新出现的场景，尤其是偏离常态的极端场景。这需要一种能够逐步整合新知识而不损害已习得能力的学习机制。然而，据我们所知，目前尚未有持续学习方法被提出，以确保自动驾驶中极端场景学习的一致性与可扩展性。为应对这些局限，我们提出VLM-C4L，一种引入视觉-语言模型（VLMs）以动态优化与增强极端场景数据集的持续学习框架。VLM-C4L结合了VLM引导的高质量数据提取与核心数据回放策略，使模型能够从多样化的极端场景中增量学习，同时保持对先前常规场景的性能，从而确保实际自动驾驶中的长期稳定性与适应性。我们在包括Waymo与极端场景数据集CODA在内的大规模真实世界自动驾驶数据集上评估了VLM-C4L。