Autonomous driving systems rely heavily on multi-view images to ensure accurate perception and robust decision-making. To effectively develop and evaluate perception stacks and planning algorithms, realistic closed-loop simulators are indispensable. While 3D reconstruction techniques such as Gaussian Splatting offer promising avenues for simulator construction, the rendered novel views often exhibit artifacts, particularly in extrapolated perspectives or when available observations are sparse. We introduce ViewMorpher3D, a multi-view image enhancement framework based on image diffusion models, designed to elevate photorealism and multi-view coherence in driving scenes. Unlike single-view approaches, ViewMorpher3D jointly processes a set of rendered views conditioned on camera poses, 3D geometric priors, and temporally adjacent or spatially overlapping reference views. This enables the model to infer missing details, suppress rendering artifacts, and enforce cross-view consistency. Our framework accommodates variable numbers of cameras and flexible reference/target view configurations, making it adaptable to diverse sensor setups. Experiments on real-world driving datasets demonstrate substantial improvements in image quality metrics, effectively reducing artifacts while preserving geometric fidelity.
翻译:自动驾驶系统高度依赖多视角图像以确保精确感知与鲁棒的决策制定。为有效开发和评估感知栈与规划算法,逼真的闭环仿真器不可或缺。虽然如高斯溅射等三维重建技术为仿真器构建提供了有前景的途径,但其渲染的新视角常存在伪影,尤其是在外推视角或可用观测稀疏的情况下。我们提出了ViewMorpher3D,一个基于图像扩散模型的多视角图像增强框架,旨在提升驾驶场景中的照片真实感与多视角一致性。与单视角方法不同,ViewMorpher3D联合处理一组以相机位姿、三维几何先验、以及时间相邻或空间重叠的参考视图为条件的渲染视图。这使得模型能够推断缺失细节、抑制渲染伪影并强化跨视角一致性。我们的框架支持可变数量的相机和灵活的参考/目标视图配置,使其能适应多样的传感器设置。在真实世界驾驶数据集上的实验表明,图像质量指标得到显著改善,在有效减少伪影的同时保持了几何保真度。