Predicting the pose of objects from a single image is an important but difficult computer vision problem. Methods that predict a single point estimate do not predict the pose of objects with symmetries well and cannot represent uncertainty. Alternatively, some works predict a distribution over orientations in $\mathrm{SO}(3)$. However, training such models can be computation- and sample-inefficient. Instead, we propose a novel mapping of features from the image domain to the 3D rotation manifold. Our method then leverages $\mathrm{SO}(3)$ equivariant layers, which are more sample efficient, and outputs a distribution over rotations that can be sampled at arbitrary resolution. We demonstrate the effectiveness of our method at object orientation prediction, and achieve state-of-the-art performance on the popular PASCAL3D+ dataset. Moreover, we show that our method can model complex object symmetries, without any modifications to the parameters or loss function. Code is available at https://dmklee.github.io/image2sphere.
翻译:从单张图像预测物体姿态是计算机视觉中重要但困难的问题。预测单点估计的方法难以处理具有对称性的物体姿态,且无法表示不确定性。另一些工作则预测$\mathrm{SO}(3)$上的方向分布,但训练此类模型可能在计算量和样本效率上存在不足。为此,我们提出一种将特征从图像域映射到三维旋转流形的新方法。该方法利用样本效率更高的$\mathrm{SO}(3)$等变层,输出可在任意分辨率下采样的旋转分布。我们在物体朝向预测任务上验证了方法的有效性,并在PASCAL3D+数据集上取得了最先进性能。此外,我们的方法无需修改参数或损失函数即可建模复杂物体对称性。代码见https://dmklee.github.io/image2sphere。