Bird's-eye View (BeV) representations have emerged as the de-facto shared space in driving applications, offering a unified space for sensor data fusion and supporting various downstream tasks. However, conventional models use grids with fixed resolution and range and face computational inefficiencies due to the uniform allocation of resources across all cells. To address this, we propose PointBeV, a novel sparse BeV segmentation model operating on sparse BeV cells instead of dense grids. This approach offers precise control over memory usage, enabling the use of long temporal contexts and accommodating memory-constrained platforms. PointBeV employs an efficient two-pass strategy for training, enabling focused computation on regions of interest. At inference time, it can be used with various memory/performance trade-offs and flexibly adjusts to new specific use cases. PointBeV achieves state-of-the-art results on the nuScenes dataset for vehicle, pedestrian, and lane segmentation, showcasing superior performance in static and temporal settings despite being trained solely with sparse signals. We will release our code along with two new efficient modules used in the architecture: Sparse Feature Pulling, designed for the effective extraction of features from images to BeV, and Submanifold Attention, which enables efficient temporal modeling. Our code is available at https://github.com/valeoai/PointBeV.
翻译:鸟瞰视角(BeV)表征已成为驾驶应用中的事实标准共享空间,为传感器数据融合提供了统一空间并支持多种下游任务。然而,传统模型采用固定分辨率与范围的网格,由于对所有单元等量分配资源而面临计算效率低下的问题。为此,我们提出PointBeV——一种新型稀疏化BeV分割模型,其基于稀疏BeV单元而非稠密网格进行运算。该方法能够精确控制内存使用,支持长时序上下文,并可适配内存受限平台。PointBeV采用高效的两阶段训练策略,使计算集中于感兴趣区域。推理阶段,它能够灵活适应不同内存/性能权衡需求,并针对新型应用场景自适应调整。在nuScense数据集上,PointBeV在车辆、行人及车道线分割任务中均取得最优性能,尽管仅使用稀疏信号训练,其在静态与时序场景下仍展现出卓越表现。我们将开源代码及架构中两个新型高效模块:用于从图像到BeV进行有效特征提取的稀疏特征拉取模块(Sparse Feature Pulling),以及实现高效时序建模的子流形注意力模块(Submanifold Attention)。代码地址:https://github.com/valeoai/PointBeV。