The inherent ambiguity in ground-truth annotations of 3D bounding boxes caused by occlusions, signal missing, or manual annotation errors can confuse deep 3D object detectors during training, thus deteriorating the detection accuracy. However, existing methods overlook such issues to some extent and treat the labels as deterministic. In this paper, we formulate the label uncertainty problem as the diversity of potentially plausible bounding boxes of objects, then propose GLENet, a generative framework adapted from conditional variational autoencoders, to model the one-to-many relationship between a typical 3D object and its potential ground-truth bounding boxes with latent variables. The label uncertainty generated by GLENet is a plug-and-play module and can be conveniently integrated into existing deep 3D detectors to build probabilistic detectors and supervise the learning of the localization uncertainty. Besides, we propose an uncertainty-aware quality estimator architecture in probabilistic detectors to guide the training of IoU-branch with predicted localization uncertainty. We incorporate the proposed methods into various popular base 3D detectors and demonstrate significant and consistent performance gains on both KITTI and Waymo benchmark datasets. Especially, the proposed GLENet-VR outperforms all published LiDAR-based approaches by a large margin and ranks $1^{st}$ among single-modal methods on the challenging KITTI test set. The code is available at https://github.com/Eaphan/GLENet.
翻译:由于遮挡、信号缺失或人工标注错误导致的3D边界框真实标注固有模糊性,可能使深度3D目标检测器在训练过程中产生混淆,从而降低检测精度。然而现有方法在某种程度上忽视了此类问题,并将标签视为确定性的。本文我们将标签不确定性问题表述为物体潜在合理边界框的多样性,随后提出GLENet——一种基于条件变分自编码器改编的生成式框架,通过潜在变量建模典型3D物体与其潜在真实边界框之间的一对多关系。GLENet生成的标签不确定性构成即插即用模块,可便捷集成至现有深度3D检测器中以构建概率检测器,并监督定位不确定性的学习。此外,我们在概率检测器中提出一种不确定性感知质量估计架构,利用预测的定位不确定性指导IoU分支的训练。我们将所提方法集成至多种主流基础3D检测器中,在KITTI和Waymo基准数据集上均展现出显著且一致的性能提升。特别地,所提出的GLENet-VR以较大优势超越所有已发表的基于LiDAR的方法,在具有挑战性的KITTI测试集上位列单模态方法首位。代码发布于https://github.com/Eaphan/GLENet。