Estimating 3D rotations is a common procedure for 3D computer vision. The accuracy depends heavily on the rotation representation. One form of representation -- rotation matrices -- is popular due to its continuity, especially for pose estimation tasks. The learning process usually incorporates orthogonalization to ensure orthonormal matrices. Our work reveals, through gradient analysis, that common orthogonalization procedures based on the Gram-Schmidt process and singular value decomposition will slow down training efficiency. To this end, we advocate removing orthogonalization from the learning process and learning unorthogonalized `Pseudo' Rotation Matrices (PRoM). An optimization analysis shows that PRoM converges faster and to a better solution. By replacing the orthogonalization incorporated representation with our proposed PRoM in various rotation-related tasks, we achieve state-of-the-art results on large-scale benchmarks for human pose estimation.
翻译:估计三维旋转是三维计算机视觉中的常见流程,其精度高度依赖于旋转表示方式。旋转矩阵作为一种表示形式,因其连续性而被广泛采用,尤其在姿态估计任务中。学习过程通常通过正交化来确保正交矩阵。本文通过梯度分析揭示,基于格拉姆-施密特过程和奇异值分解的常见正交化方法会降低训练效率。为此,我们提出从学习过程中移除正交化步骤,直接学习非正交化的"伪"旋转矩阵(PRoM)。优化分析表明,PRoM能够更快收敛并达到更优解。通过将多种旋转相关任务中使用的正交化表示替换为所提出的PRoM,我们在人体姿态估计的大规模基准测试上取得了最先进的结果。