Neural Radiance Fields (NeRF) use multi-view images for 3D scene representation, demonstrating remarkable performance. As one of the primary sources of multi-view images, multi-camera systems encounter challenges such as varying intrinsic parameters and frequent pose changes. Most previous NeRF-based methods assume a unique camera and rarely consider multi-camera scenarios. Besides, some NeRF methods that can optimize intrinsic and extrinsic parameters still remain susceptible to suboptimal solutions when these parameters are poor initialized. In this paper, we propose MC-NeRF, a method that enables joint optimization of both intrinsic and extrinsic parameters alongside NeRF. The method also supports each image corresponding to independent camera parameters. First, we tackle coupling issue and the degenerate case that arise from the joint optimization between intrinsic and extrinsic parameters. Second, based on the proposed solutions, we introduce an efficient calibration image acquisition scheme for multi-camera systems, including the design of calibration object. Finally, we present an end-to-end network with training sequence that enables the estimation of intrinsic and extrinsic parameters, along with the rendering network. Furthermore, recognizing that most existing datasets are designed for a unique camera, we construct a real multi-camera image acquisition system and create a corresponding new dataset, which includes both simulated data and real-world captured images. Experiments confirm the effectiveness of our method when each image corresponds to different camera parameters. Specifically, we use multi-cameras, each with different intrinsic and extrinsic parameters in real-world system, to achieve 3D scene representation without providing initial poses.
翻译:神经辐射场(NeRF)利用多视角图像进行三维场景表示,展现出卓越的性能。作为多视角图像的主要来源之一,多相机系统面临内参多变和位姿频繁更改等挑战。以往大多数基于NeRF的方法假设仅使用单一相机,鲜少考虑多相机场景。此外,部分能够优化内参和外参的NeRF方法在参数初始化不佳时仍易陷入次优解。本文提出MC-NeRF方法,能够联合优化内参、外参和NeRF,并支持每个图像对应独立的相机参数。首先,我们解决了内参与外参联合优化中出现的耦合问题和退化情况。其次,基于所提方案,我们为多相机系统引入高效的标定图像采集方案(含标定物设计)。最终,我们提出端到端网络与训练序列,实现内参、外参估计及渲染网络。此外,考虑到现有数据集多针对单一相机设计,我们构建了真实多相机图像采集系统及相应新数据集(含模拟数据和真实场景图像)。实验证实当每张图像对应不同相机参数时,我们的方法依然有效。具体地,我们在真实系统中使用内参和外参各不相同的多相机,无需提供初始位姿即可实现三维场景表示。