This paper describes the Qualcomm AI Research solution to the RealADSim-NVS challenge, hosted at the RealADSim Workshop at ICCV 2025. The challenge concerns novel view synthesis in street scenes, and participants are required to generate, starting from car-centric frames captured during some training traversals, renders of the same urban environment as viewed from a different traversal (e.g. different street lane or car direction). Our solution is inspired by hybrid methods in scene generation and generative simulators merging gaussian splatting and diffusion models, and it is composed of two stages: First, we fit a 3D reconstruction of the scene and render novel views as seen from the target cameras. Then, we enhance the resulting frames with a dedicated single-step diffusion model. We discuss specific choices made in the initialization of gaussian primitives as well as the finetuning of the enhancer model and its training data curation. We report the performance of our model design and we ablate its components in terms of novel view quality as measured by PSNR, SSIM and LPIPS. On the public leaderboard reporting test results, our proposal reaches an aggregated score of 0.432, achieving the second place overall.
翻译:本文介绍了高通AI研究院针对ICCV 2025 RealADSim研讨会举办的RealADSim-NVS挑战赛提出的解决方案。该挑战赛关注街景场景中的新视角合成任务,要求参赛者基于训练阶段采集的以车辆为中心的帧序列,生成同一城市环境在不同行驶轨迹(例如不同车道或行驶方向)下的渲染视图。我们的方案受到场景生成与生成式模拟器中混合方法的启发,融合了高斯泼溅与扩散模型技术,其流程包含两个阶段:首先,我们对场景进行三维重建并渲染目标相机视角下的新视图;随后,通过专用的单步扩散模型对生成帧进行增强处理。文中详细探讨了高斯基元初始化、增强模型微调及其训练数据构建的具体策略。我们报告了模型设计的性能表现,并通过PSNR、SSIM和LPIPS指标对新视角生成质量进行了组件消融实验。在公布测试结果的公开排行榜上,我们的方案以0.432的综合得分位列总排名第二。