基于混合高斯泼溅的城市新视角合成 (Hybrid Gaussian Splatting for Novel Urban View Synthesis)

This paper describes the Qualcomm AI Research solution to the RealADSim-NVS challenge, hosted at the RealADSim Workshop at ICCV 2025. The challenge concerns novel view synthesis in street scenes, and participants are required to generate, starting from car-centric frames captured during some training traversals, renders of the same urban environment as viewed from a different traversal (e.g. different street lane or car direction). Our solution is inspired by hybrid methods in scene generation and generative simulators merging gaussian splatting and diffusion models, and it is composed of two stages: First, we fit a 3D reconstruction of the scene and render novel views as seen from the target cameras. Then, we enhance the resulting frames with a dedicated single-step diffusion model. We discuss specific choices made in the initialization of gaussian primitives as well as the finetuning of the enhancer model and its training data curation. We report the performance of our model design and we ablate its components in terms of novel view quality as measured by PSNR, SSIM and LPIPS. On the public leaderboard reporting test results, our proposal reaches an aggregated score of 0.432, achieving the second place overall.

翻译：本文介绍了高通AI研究院针对ICCV 2025 RealADSim研讨会举办的RealADSim-NVS挑战赛提出的解决方案。该挑战赛关注街景场景中的新视角合成任务，要求参赛者基于训练阶段采集的以车辆为中心的帧序列，生成同一城市环境在不同行驶轨迹（例如不同车道或行驶方向）下的渲染视图。我们的方案受到场景生成与生成式模拟器中混合方法的启发，融合了高斯泼溅与扩散模型技术，其流程包含两个阶段：首先，我们对场景进行三维重建并渲染目标相机视角下的新视图；随后，通过专用的单步扩散模型对生成帧进行增强处理。文中详细探讨了高斯基元初始化、增强模型微调及其训练数据构建的具体策略。我们报告了模型设计的性能表现，并通过PSNR、SSIM和LPIPS指标对新视角生成质量进行了组件消融实验。在公布测试结果的公开排行榜上，我们的方案以0.432的综合得分位列总排名第二。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日