Multi-sensor fusion is crucial for improving the performance and robustness of end-to-end autonomous driving systems. Existing methods predominantly adopt either attention-based flatten fusion or bird's eye view fusion through geometric transformations. However, these approaches often suffer from limited interpretability or dense computational overhead. In this paper, we introduce GaussianFusion, a Gaussian-based multi-sensor fusion framework for end-to-end autonomous driving. Our method employs intuitive and compact Gaussian representations as intermediate carriers to aggregate information from diverse sensors. Specifically, we initialize a set of 2D Gaussians uniformly across the driving scene, where each Gaussian is parameterized by physical attributes and equipped with explicit and implicit features. These Gaussians are progressively refined by integrating multi-modal features. The explicit features capture rich semantic and spatial information about the traffic scene, while the implicit features provide complementary cues beneficial for trajectory planning. To fully exploit rich spatial and semantic information in Gaussians, we design a cascade planning head that iteratively refines trajectory predictions through interactions with Gaussians. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate the effectiveness and robustness of the proposed GaussianFusion framework. The source code will be released at https://github.com/Say2L/GaussianFusion.
翻译:多传感器融合对于提升端到端自动驾驶系统的性能与鲁棒性至关重要。现有方法主要采用基于注意力的扁平化融合或通过几何变换的鸟瞰图融合,但这些方法往往存在可解释性有限或计算开销密集的问题。本文提出GaussianFusion,一种基于高斯分布的多传感器融合框架,用于端到端自动驾驶。该方法采用直观且紧凑的高斯表示作为中间载体,以聚合来自不同传感器的信息。具体而言,我们在驾驶场景中均匀初始化一组二维高斯分布,其中每个高斯分布由物理属性参数化,并配备显式与隐式特征。通过融合多模态特征,这些高斯分布被逐步优化:显式特征捕获交通场景中丰富的语义与空间信息,而隐式特征则为轨迹规划提供有益的互补线索。为充分利用高斯分布中丰富的空间与语义信息,我们设计了一个级联规划头,通过与高斯分布的交互迭代优化轨迹预测。在NAVSIM和Bench2Drive基准测试上的大量实验验证了所提GaussianFusion框架的有效性与鲁棒性。源代码将在https://github.com/Say2L/GaussianFusion发布。