Camera Calibration through Geometric Constraints from Rotation and Projection Matrices

The process of camera calibration involves estimating the intrinsic and extrinsic parameters, which are essential for accurately performing tasks such as 3D reconstruction, object tracking and augmented reality. In this work, we propose a novel constraints-based loss for measuring the intrinsic (focal length: $(f_x, f_y)$ and principal point: $(p_x, p_y)$) and extrinsic (baseline: ($b$), disparity: ($d$), translation: $(t_x, t_y, t_z)$, and rotation specifically pitch: $(\theta_p)$) camera parameters. Our novel constraints are based on geometric properties inherent in the camera model, including the anatomy of the projection matrix (vanishing points, image of world origin, axis planes) and the orthonormality of the rotation matrix. Thus we proposed a novel Unsupervised Geometric Constraint Loss (UGCL) via a multitask learning framework. Our methodology is a hybrid approach that employs the learning power of a neural network to estimate the desired parameters along with the underlying mathematical properties inherent in the camera projection matrix. This distinctive approach not only enhances the interpretability of the model but also facilitates a more informed learning process. Additionally, we introduce a new CVGL Camera Calibration dataset, featuring over 900 configurations of camera parameters, incorporating 63,600 image pairs that closely mirror real-world conditions. By training and testing on both synthetic and real-world datasets, our proposed approach demonstrates improvements across all parameters when compared to the state-of-the-art (SOTA) benchmarks. The code and the updated dataset can be found here: https://github.com/CVLABLUMS/CVGL-Camera-Calibration

翻译：相机标定过程涉及估算内参和外参，这对于精确执行三维重建、目标跟踪和增强现实等任务至关重要。本文提出了一种新颖的基于约束的损失函数，用于测量内参（焦距：$(f_x, f_y)$ 和主点：$(p_x, p_y)$）和外参（基线：($b$)、视差：($d$)、平移：$(t_x, t_y, t_z)$，以及旋转分量——俯仰角：$(\theta_p)$）。该约束基于相机模型固有的几何特性，包括投影矩阵的结构（消失点、世界原点投影点、坐标轴平面）以及旋转矩阵的正交归一性。由此，我们通过多任务学习框架提出了一种新颖的无监督几何约束损失（UGCL）。我们的方法是一种混合策略，既利用了神经网络的学习能力来估算所需参数，又融合了相机投影矩阵固有的数学特性。这种独特的方法不仅增强了模型的可解释性，还促进了更具信息性的学习过程。此外，我们引入了一个新的CVGL相机标定数据集，包含超过900种相机参数配置，以及63600对图像对，这些图像对高度模拟了真实世界场景。通过在合成数据集和真实数据集上进行训练和测试，与最先进（SOTA）基准相比，我们的方法在所有参数上均表现出改进。代码和更新后的数据集可在以下地址获取：https://github.com/CVLABLUMS/CVGL-Camera-Calibration