To automatically localize a target object in an image is crucial for many computer vision applications. To represent the 2D object, ellipse labels have recently been identified as a promising alternative to axis-aligned bounding boxes. This paper further considers 3D-aware ellipse labels, \textit{i.e.}, ellipses which are projections of a 3D ellipsoidal approximation of the object, for 2D target localization. Indeed, projected ellipses carry more geometric information about the object geometry and pose (3D awareness) than traditional 3D-agnostic bounding box labels. Moreover, such a generic 3D ellipsoidal model allows for approximating known to coarsely known targets. We then propose to have a new look at ellipse regression and replace the discontinuous geometric ellipse parameters with the parameters of an implicit Gaussian distribution encoding object occupancy in the image. The models are trained to regress the values of this bivariate Gaussian distribution over the image pixels using a statistical loss function. We introduce a novel non-trainable differentiable layer, E-DSNT, to extract the distribution parameters. Also, we describe how to readily generate consistent 3D-aware Gaussian occupancy parameters using only coarse dimensions of the target and relative pose labels. We extend three existing spacecraft pose estimation datasets with 3D-aware Gaussian occupancy labels to validate our hypothesis. Labels and source code are publicly accessible here: https://cvi2.uni.lu/3d-aware-obj-loc/.
翻译:在图像中自动定位目标对象对许多计算机视觉应用至关重要。为表示二维目标,椭圆标签近期被证明是轴对齐边界框的一种有前景替代方案。本文进一步考虑三维感知椭圆标签(即对象三维椭球体近似在二维投影)进行目标定位。与传统的三维无关边界框标签相比,投影椭圆能携带更多关于对象几何形状和姿态(三维感知)的信息。此外,这种通用三维椭球体模型可对已知至粗略已知的目标进行近似。我们提出对椭圆回归进行重新审视,用编码图像中目标占用率的隐式高斯分布参数替代不连续的几何椭圆参数。模型通过统计损失函数训练,以回归图像像素上的双变量高斯分布值。我们引入一种新的不可训练可微分层E-DSNT来提取分布参数。同时,描述如何仅利用目标的粗略尺寸和相对姿态标签生成一致的三维感知高斯占用参数。我们扩展了三个现有航天器姿态估计数据集,添加三维感知高斯占用标签以验证假设。标签和源代码可在以下链接公开获取:https://cvi2.uni.lu/3d-aware-obj-loc/