InDRiVE：基于内在分歧的车辆探索强化方法——通过好奇心驱动的广义世界模型 (InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model)

Model-based Reinforcement Learning (MBRL) has emerged as a promising paradigm for autonomous driving, where data efficiency and robustness are critical. Yet, existing solutions often rely on carefully crafted, task specific extrinsic rewards, limiting generalization to new tasks or environments. In this paper, we propose InDRiVE (Intrinsic Disagreement based Reinforcement for Vehicle Exploration), a method that leverages purely intrinsic, disagreement based rewards within a Dreamer based MBRL framework. By training an ensemble of world models, the agent actively explores high uncertainty regions of environments without any task specific feedback. This approach yields a task agnostic latent representation, allowing for rapid zero shot or few shot fine tuning on downstream driving tasks such as lane following and collision avoidance. Experimental results in both seen and unseen environments demonstrate that InDRiVE achieves higher success rates and fewer infractions compared to DreamerV2 and DreamerV3 baselines despite using significantly fewer training steps. Our findings highlight the effectiveness of purely intrinsic exploration for learning robust vehicle control behaviors, paving the way for more scalable and adaptable autonomous driving systems.

翻译：基于模型的强化学习已成为自动驾驶领域一种前景广阔的范式，其中数据效率和鲁棒性至关重要。然而，现有解决方案通常依赖于精心设计的、任务特定的外在奖励，这限制了对新任务或新环境的泛化能力。本文提出 InDRiVE（基于内在分歧的车辆探索强化方法），该方法在基于 Dreamer 的 MBRL 框架内，利用纯粹内在的、基于分歧的奖励进行学习。通过训练一个世界模型集成，智能体能够在没有任何任务特定反馈的情况下，主动探索环境中的高不确定性区域。这种方法产生了一种与任务无关的潜在表示，使得能够在下游驾驶任务（如车道保持和碰撞避免）上实现快速的零样本或少样本微调。在已见和未见环境中的实验结果表明，与 DreamerV2 和 DreamerV3 基线相比，InDRiVE 尽管使用了显著更少的训练步数，仍实现了更高的成功率和更少的违规行为。我们的研究结果凸显了纯粹内在探索对于学习鲁棒车辆控制行为的有效性，为更具可扩展性和适应性的自动驾驶系统铺平了道路。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日