Learning from Human Driving: A Human-in-the-Loop Online Behavior Cloning Framework for Autonomous Driving

With the evolution of large foundation models (LFMs), data-driven autonomous driving has made significant strides. However, existing paradigms still face severe challenges in complex interaction and long-tail scenarios due to distribution shift and causal confusion. These limitations often result in a lack of human-level decision-making flexibility and safety in extreme conditions. To overcome this limitation, this paper proposes a Human-in-the-Loop Online Behavior Cloning frame work (HiL-OBC) for autonomous driving, which aims to deeply integrate the cross-modal perceptual capabilities of LFMs with the high-level driving intelligence of human experts. Specifically, HiL-OBC deployment is executed through three critical phases: policy initialization with human intervention, latent behavioral modeling with Bayesian policy adaptation, and online deploy ment and updates. Furthermore, we design a Multi-modal Online Behavior Cloning (MOBC) model, which optimizes the base driving policy online through a lightweight network architecture, a takeover trigger mechanism, and a multi-variant loss function, thereby enhancing the system's decision-making robustness in complex environments. We evaluated the HiL-OBC on the LangAuto-Human CARLA benchmark. Experimental results demonstrate that the driving policies optimized via the human-in-the-loop mechanism achieve substantial performance gains: the DS of StructNav, LFG, and LMDrive increased by 47.25%, 31.59%, and 32.12%, respectively, with a simultaneous of various experimental settings and key components highlights the advantages of human-in-the-loop learning in improving decision-making robustness and overall driving performance.

翻译：随着大型基础模型（LFMs）的发展，数据驱动的自动驾驶取得了显著进展。然而，现有范式在复杂交互和长尾场景中仍面临分布偏移和因果混淆带来的严峻挑战。这些局限性往往导致极端条件下缺乏人类水平的决策灵活性和安全性。为克服这一局限，本文提出了一种面向自动驾驶的人机协同在线行为克隆框架（HiL-OBC），旨在深度融合LFMs的跨模态感知能力与人类专家的高级驾驶智能。具体而言，HiL-OBC的部署通过三个关键阶段执行：带有人工干预的策略初始化、基于贝叶斯策略适应的潜在行为建模，以及在线部署与更新。此外，我们设计了一种多模态在线行为克隆模型（MOBC），通过轻量级网络架构、接管触发机制和多变体损失函数在线优化基础驾驶策略，从而增强系统在复杂环境中的决策鲁棒性。我们在LangAuto-Human CARLA基准上评估了HiL-OBC。实验结果表明，通过人机协同机制优化的驾驶策略实现了显著的性能提升：StructNav、LFG和LMDrive的DS分别提高了47.25%、31.59%和32.12%，同时多种实验设置和关键组件的分析凸显了人机协同学习在提升决策鲁棒性和整体驾驶性能方面的优势。