What You See Is What You Detect: Towards better Object Densification in 3D detection

Recent works have demonstrated the importance of object completion in 3D Perception from Lidar signal. Several methods have been proposed in which modules were used to densify the point clouds produced by laser scanners, leading to better recall and more accurate results. Pursuing in that direction, we present, in this work, a counter-intuitive perspective: the widely-used full-shape completion approach actually leads to a higher error-upper bound especially for far away objects and small objects like pedestrians. Based on this observation, we introduce a visible part completion method that requires only 11.3\% of the prediction points that previous methods generate. To recover the dense representation, we propose a mesh-deformation-based method to augment the point set associated with visible foreground objects. Considering that our approach focuses only on the visible part of the foreground objects to achieve accurate 3D detection, we named our method What You See Is What You Detect (WYSIWYD). Our proposed method is thus a detector-independent model that consists of 2 parts: an Intra-Frustum Segmentation Transformer (IFST) and a Mesh Depth Completion Network(MDCNet) that predicts the foreground depth from mesh deformation. This way, our model does not require the time-consuming full-depth completion task used by most pseudo-lidar-based methods. Our experimental evaluation shows that our approach can provide up to 12.2\% performance improvements over most of the public baseline models on the KITTI and NuScenes dataset bringing the state-of-the-art to a new level. The codes will be available at \textcolor[RGB]{0,0,255}{\url{{https://github.com/Orbis36/WYSIWYD}}

翻译：近期研究已证明，在基于激光雷达信号的3D感知中，目标补全具有重要作用。多种方法被提出，通过模块对激光扫描仪生成的点云进行致密化，以获得更高的召回率和更精确的结果。沿着这一方向，本文提出了一个反直觉的观点：广泛使用的全形状补全方法实际上会导致更高的误差上界，尤其对于远距离目标及行人等小目标。基于此观察，我们引入了一种可见部分补全方法，该方法仅需先前方法生成预测点的11.3%。为恢复密集表示，我们提出了一种基于网格变形的方法，用于增强与可见前景目标关联的点集。考虑到本方法仅聚焦于前景目标的可视部分以实现精确3D检测，我们将该方法命名为“所见即所测”（WYSIWYD）。所提方法是一种与检测器无关的模型，包含两部分：帧内视锥分割变换器（IFST）和网格深度补全网络（MDCNet），后者通过网格变形预测前景深度。由此，我们的模型无需像大多数伪激光雷达方法那样进行耗时的全深度补全任务。实验评估表明，在KITTI和NuScenes数据集上，本方法相较大多数公开基线模型可获得高达12.2%的性能提升，将现有最优水平推至新高度。代码将发布于 \textcolor[RGB]{0,0,255}{\url{{https://github.com/Orbis36/WYSIWYD}}。