We introduce CGA-PoseNet, which uses the 1D-Up approach to Conformal Geometric Algebra (CGA) to represent rotations and translations with a single mathematical object, the motor, for camera pose regression. We do so starting from PoseNet, which successfully predicts camera poses from small datasets of RGB frames. State-of-the-art methods, however, require expensive tuning to balance the orientational and translational components of the camera pose.This is usually done through complex, ad-hoc loss function to be minimized, and in some cases also requires 3D points as well as images. Our approach has the advantage of unifying the camera position and orientation through the motor. Consequently, the network searches for a single object which lives in a well-behaved 4D space with a Euclidean signature. This means that we can address the case of image-only datasets and work efficiently with a simple loss function, namely the mean squared error (MSE) between the predicted and ground truth motors. We show that it is possible to achieve high accuracy camera pose regression with a significantly simpler problem formulation. This 1D-Up approach to CGA can be employed to overcome the dichotomy between translational and orientational components in camera pose regression in a compact and elegant way.
翻译:我们提出CGA-PoseNet,该方法采用共形几何代数(CGA)的一维升维方法,以单一数学对象——运动子(motor)表示旋转和平移,用于相机位姿回归。我们以PoseNet为基础,该网络能从小规模RGB帧数据集中成功预测相机位姿。然而,现有先进方法需要昂贵的调参来平衡相机位姿的方向分量与平移分量,通常通过构建复杂的专用损失函数进行最小化实现,某些情况下还需依赖三维点云与图像数据。本方法的优势在于通过运动子统一相机位置与姿态,使网络搜索位于具有欧几里得度量的良性四维空间中的单一对象。这意味着我们可处理仅含图像的数据集,并通过简单损失函数(即预测运动子与真实运动子之间的均方误差MSE)高效工作。研究表明,采用显著简化的问题表述即可实现高精度相机位姿回归。这种CGA的一维升维方法能够以简洁优雅的方式克服相机位姿回归中平移分量与方向分量之间的二元对立。