The benefit of transformers in large-scale 3D point cloud perception tasks, such as 3D object detection, is limited by their quadratic computation cost when modeling long-range relationships. In contrast, linear RNNs have low computational complexity and are suitable for long-range modeling. Toward this goal, we propose a simple and effective window-based framework built on LInear grOup RNN (i.e., perform linear RNN for grouped features) for accurate 3D object detection, called LION. The key property is to allow sufficient feature interaction in a much larger group than transformer-based methods. However, effectively applying linear group RNN to 3D object detection in highly sparse point clouds is not trivial due to its limitation in handling spatial modeling. To tackle this problem, we simply introduce a 3D spatial feature descriptor and integrate it into the linear group RNN operators to enhance their spatial features rather than blindly increasing the number of scanning orders for voxel features. To further address the challenge in highly sparse point clouds, we propose a 3D voxel generation strategy to densify foreground features thanks to linear group RNN as a natural property of auto-regressive models. Extensive experiments verify the effectiveness of the proposed components and the generalization of our LION on different linear group RNN operators including Mamba, RWKV, and RetNet. Furthermore, it is worth mentioning that our LION-Mamba achieves state-of-the-art on Waymo, nuScenes, Argoverse V2, and ONCE dataset. Last but not least, our method supports kinds of advanced linear RNN operators (e.g., RetNet, RWKV, Mamba, xLSTM and TTT) on small but popular KITTI dataset for a quick experience with our linear RNN-based framework.
翻译:在大规模三维点云感知任务(如三维目标检测)中,Transformer 在建模长距离关系时因其二次计算成本而受限。相比之下,线性循环神经网络(RNN)具有较低的计算复杂度,适合长距离建模。为此,我们提出了一种基于线性分组循环神经网络(即对分组特征执行线性 RNN)的简单而有效的窗口化框架,称为 LION,用于实现精确的三维目标检测。其关键特性在于能够在比基于 Transformer 的方法大得多的分组内实现充分的特征交互。然而,由于线性分组 RNN 在处理空间建模方面的局限性,将其有效应用于高度稀疏点云中的三维目标检测并非易事。为解决此问题,我们简单地引入了一个三维空间特征描述符,并将其集成到线性分组 RNN 算子中,以增强其空间特征,而不是盲目增加体素特征的扫描顺序数量。为了进一步应对高度稀疏点云中的挑战,我们提出了一种三维体素生成策略,利用线性分组 RNN 作为自回归模型的固有特性来增强前景特征的密度。大量实验验证了所提出组件的有效性,以及我们的 LION 在不同线性分组 RNN 算子(包括 Mamba、RWKV 和 RetNet)上的泛化能力。此外,值得一提的是,我们的 LION-Mamba 在 Waymo、nuScenes、Argoverse V2 和 ONCE 数据集上达到了最先进的性能。最后但同样重要的是,我们的方法支持在小而流行的 KITTI 数据集上使用多种先进的线性 RNN 算子(如 RetNet、RWKV、Mamba、xLSTM 和 TTT),以便快速体验我们基于线性 RNN 的框架。