Extrapolative novel view synthesis can reduce camera-rig dependency in autonomous driving by generating standardized virtual views from heterogeneous sensors. Existing methods degrade outside recorded trajectories because extrapolated poses provide weak geometric support and no dense target-view supervision. The key is to explicitly expose the model to out-of-trajectory condition defects during training. We propose Geo-EVS, a geometry-conditioned framework under sparse supervision. Geo-EVS has two components. Geometry-Aware Reprojection (GAR) uses fine-tuned VGGT to reconstruct colored point clouds and reproject them to observed and virtual target poses, producing geometric condition maps. This design unifies the reprojection path between training and inference. Artifact-Guided Latent Diffusion (AGLD) injects reprojection-derived artifact masks during training so the model learns to recover structure under missing support. For evaluation, we use a LiDAR-Projected Sparse-Reference (LPSR) protocol when dense extrapolated-view ground truth is unavailable. On Waymo, Geo-EVS improves sparse-view synthesis quality and geometric accuracy, especially in high-angle and low-coverage settings. It also improves downstream 3D detection.
翻译:外推式新视角合成可通过异构传感器生成标准化虚拟视图,降低自动驾驶中的相机配置依赖性。现有方法在记录轨迹外表现退化,因为外推位姿提供弱几何支撑且缺乏密集目标视图监督。关键在于训练过程中显式暴露模型于轨迹外条件缺陷。我们提出Geo-EVS——一种稀疏监督下的几何条件框架。Geo-EVS包含两个组件:几何感知重投影(GAR)利用微调后的VGGT重建彩色点云并重投影至观测位姿与虚拟目标位姿,生成几何条件图。该设计统一了训练与推理阶段的重投影路径;伪影引导潜扩散(AGLD)在训练中注入基于重投影的伪影掩码,使模型学习在支撑缺失条件下恢复结构。评估方面,我们在无密集外推视图真值时采用激光雷达投影稀疏参考(LPSR)协议。在Waymo数据集上,Geo-EVS提升了稀疏视图合成质量与几何精度,尤其在高度角与低覆盖场景中表现显著,同时改善了下游3D检测性能。