To automatically localize a target object in an image is crucial for many computer vision applications. Recently ellipse representations have been identified as an alternative to axis-aligned bounding boxes for object localization. This paper considers 3D-aware ellipse labels, i.e., which are projections of a 3D ellipsoidal approximation of the object in the images for 2D target localization. Such generic ellipsoidal models allow for handling coarsely known targets, and 3D-aware ellipse detections carry more geometric information about the object than traditional 3D-agnostic bounding box labels. We propose to have a new look at ellipse regression and replace the geometric ellipse parameters with the parameters of an implicit Gaussian distribution encoding object occupancy in the image. The models are trained to regress the values of this bivariate Gaussian distribution over the image pixels using a continuous statistical loss function. We introduce a novel non-trainable differentiable layer, E-DSNT, to extract the distribution parameters. Also, we describe how to readily generate consistent 3D-aware Gaussian occupancy parameters using only coarse dimensions of the target and relative pose labels. We extend three existing spacecraft pose estimation datasets with 3D-aware Gaussian occupancy labels to validate our hypothesis.
翻译:在图像中自动定位目标物体对许多计算机视觉应用至关重要。近年来,椭圆表示被提出作为轴对齐边界框的替代方案用于物体定位。本文考虑三维感知椭圆标签,即目标物体在二维图像中的三维椭球近似的投影,用于二维目标定位。这种通用椭球模型能够处理粗略已知的目标,而三维感知椭圆检测相比传统三维无关边界框标签能携带更多关于物体的几何信息。我们提出重新审视椭圆回归,将几何椭圆参数替换为编码图像中物体占用的隐式高斯分布参数。模型通过连续统计损失函数训练,用于回归图像像素上该二元高斯分布的值。我们引入一种新型不可训练可微层E-DSNT来提取分布参数。此外,我们描述了如何仅利用目标的粗略尺寸和相对姿态标签,便捷地生成一致的三维感知高斯占用参数。为验证假设,我们对三个现有航天器姿态估计数据集扩展了三维感知高斯占用标签。