Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning. Learning-based approaches attempt to address this challenge with advanced generative models, removing the dependency on over-engineered architectures for representation fusion. However, brute-force implementation by simply stacking transformer blocks lacks a dedicated mechanism for modeling interactive behaviors that are common in real driving scenarios. The scarcity of interactive driving data further exacerbates this problem, leaving conventional imitation learning methods ill-equipped to capture high-value interactive behaviors. We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme. Specifically, we first introduce fine-grained trajectory tokenization, which decomposes the trajectory into overlapping segments to decrease the complexity of whole trajectory modeling. With a sophisticatedly designed architecture, we achieve efficient temporal and spatial fusion of planning and scene information, to better capture interactive behaviors. In addition, the framework incorporates flow matching with classifier-free guidance for multi-modal behavior generation, which dynamically reweights agent interactions during inference to maintain coherent response strategies, providing a critical boost for interactive scenario understanding. Experimental results on the large-scale nuPlan dataset and challenging interactive interPlan dataset demonstrate that Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.
翻译:在复杂场景中建模交互式驾驶行为仍然是自动驾驶规划的一个根本性挑战。基于学习的方法试图通过先进的生成模型来解决这一挑战,从而消除对过度工程化的表示融合架构的依赖。然而,简单地堆叠Transformer模块的暴力实现方式缺乏对真实驾驶场景中常见的交互行为进行建模的专用机制。交互式驾驶数据的稀缺进一步加剧了这一问题,使得传统的模仿学习方法难以捕捉高价值的交互行为。我们提出了Flow Planner,它通过在数据建模、模型架构和学习方案方面的协同创新来解决这些问题。具体来说,我们首先引入了细粒度的轨迹标记化,将轨迹分解为重叠的片段以降低整个轨迹建模的复杂性。通过精心设计的架构,我们实现了规划信息与场景信息在时间和空间上的高效融合,以更好地捕捉交互行为。此外,该框架结合了流匹配与无分类器引导用于多模态行为生成,在推理过程中动态地重新加权智能体间的交互以保持连贯的响应策略,从而为交互式场景理解提供了关键助力。在大规模nuPlan数据集和具有挑战性的交互式interPlan数据集上的实验结果表明,Flow Planner在基于学习的方法中实现了最先进的性能,同时有效地建模了复杂驾驶场景中的交互行为。