We introduce a technique for detecting 3D objects and estimating their position from a single image. Our method is built on top of a similar state-of-the-art technique [1], but with improved accuracy. The approach followed in this research first estimates common 3D properties of an object using a Deep Convolutional Neural Network (DCNN), contrary to other frameworks that only leverage centre-point predictions. We then combine these estimates with geometric constraints provided by a 2D bounding box to produce a complete 3D bounding box. The first output of our network estimates the 3D object orientation using a discrete-continuous loss [1]. The second output predicts the 3D object dimensions with minimal variance. Here we also present our extensions by augmenting light-weight feature extractors and a customized multibin architecture. By combining these estimates with the geometric constraints of the 2D bounding box, we can accurately (or comparatively) determine the 3D object pose better than our baseline [1] on the KITTI 3D detection benchmark [2].
翻译:我们提出了一种从单张图像检测三维目标并估计其位置的技术。该方法基于当前最先进的相似技术[1]构建,但具有更高的精度。本研究采用的方法首先通过深度卷积神经网络(DCNN)估计目标的通用三维属性,这与仅利用中心点预测的其他框架不同。随后,我们将这些估计值与二维边界框提供的几何约束相结合,生成完整的三维边界框。网络的第一输出利用离散-连续损失函数[1]估计三维目标朝向,第二输出则预测方差最小的三维目标尺寸。在此我们进一步提出改进方案,通过增强轻量级特征提取器与定制化多箱体架构来扩展模型。通过将这些估计值与二维边界框的几何约束相结合,我们能够在KITTI三维检测基准[2]上,比基线模型[1]更准确(或具有可比性地)确定三维目标姿态。