It has been shown that learning radiance fields with depth rendering and depth supervision can effectively promote the quality and convergence of view synthesis. However, this paradigm requires input RGB-D sequences to be synchronized, hindering its usage in the UAV city modeling scenario. As there exists asynchrony between RGB images and depth images due to high-speed flight, we propose a novel time-pose function, which is an implicit network that maps timestamps to $\rm SE(3)$ elements. To simplify the training process, we also design a joint optimization scheme to jointly learn the large-scale depth-regularized radiance fields and the time-pose function. Our algorithm consists of three steps: (1) time-pose function fitting, (2) radiance field bootstrapping, (3) joint pose error compensation and radiance field refinement. In addition, we propose a large synthetic dataset with diverse controlled mismatches and ground truth to evaluate this new problem setting systematically. Through extensive experiments, we demonstrate that our method outperforms baselines without regularization. We also show qualitatively improved results on a real-world asynchronous RGB-D sequence captured by drone. Codes, data, and models will be made publicly available.
翻译:研究表明,通过深度渲染和深度监督学习辐射场可以有效提升视角合成的质量与收敛速度。然而,这一范式要求输入的RGB-D序列是同步的,限制了其在无人机城市建模场景中的应用。针对高速飞行导致的RGB图像与深度图像之间的异步问题,我们提出了一种新颖的时间-位姿函数,该函数是一个将时间戳映射到$\rm SE(3)$元素的隐式网络。为简化训练过程,我们还设计了一种联合优化方案,以同步学习大规模深度正则化辐射场与时间-位姿函数。我们的算法包含三个步骤:(1)时间-位姿函数拟合,(2)辐射场引导初始化,(3)联合位姿误差补偿与辐射场优化。此外,我们构建了一个包含多样化控制失配参数及真值的大规模合成数据集,以系统性地评估这一新问题设定。通过广泛实验,我们证明所提方法优于无正则化的基线方法。我们还在无人机拍摄的真实异步RGB-D序列上展示了定性改善的结果。代码、数据和模型将公开发布。