3D object detection in point clouds is important for autonomous driving systems. A primary challenge in 3D object detection stems from the sparse distribution of points within the 3D scene. Existing high-performance methods typically employ 3D sparse convolutional neural networks with small kernels to extract features. To reduce computational costs, these methods resort to submanifold sparse convolutions, which prevent the information exchange among spatially disconnected features. Some recent approaches have attempted to address this problem by introducing large-kernel convolutions or self-attention mechanisms, but they either achieve limited accuracy improvements or incur excessive computational costs. We propose HEDNet, a hierarchical encoder-decoder network for 3D object detection, which leverages encoder-decoder blocks to capture long-range dependencies among features in the spatial space, particularly for large and distant objects. We conducted extensive experiments on the Waymo Open and nuScenes datasets. HEDNet achieved superior detection accuracy on both datasets than previous state-of-the-art methods with competitive efficiency. The code is available at https://github.com/zhanggang001/HEDNet.
翻译:点云中的三维目标检测对自动驾驶系统至关重要。三维目标检测面临的主要挑战源于三维场景中点的稀疏分布。现有高性能方法通常采用小核三维稀疏卷积神经网络提取特征。为降低计算成本,这些方法采用子流形稀疏卷积,但这会阻止空间非连通特征间的信息交换。近期部分研究尝试通过引入大核卷积或自注意力机制解决该问题,但要么精度提升有限,要么计算成本过高。我们提出HEDNet——一种用于三维目标检测的层级编码-解码网络,通过编码-解码模块捕获空间域中特征的长程依赖关系,尤其针对大型及远距离目标。我们在Waymo Open和nuScenes数据集上进行大量实验。相比现有最优方法,HEDNet在两个数据集上均取得更优检测精度且保持竞争性效率。代码开源于https://github.com/zhanggang001/HEDNet。