FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels

LiDAR-based fully sparse architecture has garnered increasing attention. FSDv1 stands out as a representative work, achieving impressive efficacy and efficiency, albeit with intricate structures and handcrafted designs. In this paper, we present FSDv2, an evolution that aims to simplify the previous FSDv1 while eliminating the inductive bias introduced by its handcrafted instance-level representation, thus promoting better general applicability. To this end, we introduce the concept of \textbf{virtual voxels}, which takes over the clustering-based instance segmentation in FSDv1. Virtual voxels not only address the notorious issue of the Center Feature Missing problem in fully sparse detectors but also endow the framework with a more elegant and streamlined approach. Consequently, we develop a suite of components to complement the virtual voxel concept, including a virtual voxel encoder, a virtual voxel mixer, and a virtual voxel assignment strategy. Through empirical validation, we demonstrate that the virtual voxel mechanism is functionally similar to the handcrafted clustering in FSDv1 while being more general. We conduct experiments on three large-scale datasets: Waymo Open Dataset, Argoverse 2 dataset, and nuScenes dataset. Our results showcase state-of-the-art performance on all three datasets, highlighting the superiority of FSDv2 in long-range scenarios and its general applicability to achieve competitive performance across diverse scenarios. Moreover, we provide comprehensive experimental analysis to elucidate the workings of FSDv2. To foster reproducibility and further research, we have open-sourced FSDv2 at https://github.com/tusen-ai/SST.

翻译：基于激光雷达的全稀疏架构日益受到关注。FSDv1作为代表性工作，虽取得了显著的效果和效率，但其结构复杂且包含手工设计。本文提出FSDv2，旨在简化先前的FSDv1，同时消除其手工设计实例级表示带来的归纳偏置，从而提升通用适用性。为此，我们引入**虚拟体素**概念，该概念取代了FSDv1中基于聚类的实例分割。虚拟体素不仅解决了全稀疏检测器中中心特征缺失这一痼疾，还使框架更加优雅简洁。基于此，我们开发了一系列配套组件以完善虚拟体素理念，包括虚拟体素编码器、虚拟体素混合器及虚拟体素分配策略。通过实证验证，我们证明虚拟体素机制在功能上与FSDv1中手工设计的聚类类似，但更具通用性。我们在Waymo开放数据集、Argoverse 2数据集和nuScenes数据集三个大规模基准上开展实验。结果表明，FSDv2在所有三个数据集上均达到最先进性能，凸显其在远距离场景中的优势以及在不同场景下实现有竞争力性能的通用适用性。此外，我们提供了全面的实验分析以阐明FSDv2的工作机制。为促进可复现性和后续研究，我们已在https://github.com/tusen-ai/SST开源了FSDv2。