Feed-forward 3D Gaussian Splatting (3DGS) models have recently emerged as a promising solution for novel view synthesis, enabling one-pass inference without the need for per-scene 3DGS optimization. However, their scalability is fundamentally constrained by the limited capacity of their models, leading to degraded performance or excessive memory consumption as the number of input views increases. In this work, we analyze feed-forward 3DGS frameworks through the lens of the Information Bottleneck principle and introduce ZPressor, a lightweight architecture-agnostic module that enables efficient compression of multi-view inputs into a compact latent state $Z$ that retains essential scene information while discarding redundancy. Concretely, ZPressor enables existing feed-forward 3DGS models to scale to over 100 input views at 480P resolution on an 80GB GPU, by partitioning the views into anchor and support sets and using cross attention to compress the information from the support views into anchor views, forming the compressed latent state $Z$. We show that integrating ZPressor into several state-of-the-art feed-forward 3DGS models consistently improves performance under moderate input views and enhances robustness under dense view settings on two large-scale benchmarks DL3DV-10K and RealEstate10K. The video results, code and trained models are available on our project page: https://lhmd.top/zpressor.
翻译:前馈式三维高斯泼溅(3DGS)模型近年来已成为新颖视角合成的有前景解决方案,能够实现单次推理而无需进行逐场景的3DGS优化。然而,其可扩展性从根本上受到模型有限容量的制约,随着输入视角数量的增加,会导致性能下降或内存消耗激增。本研究通过信息瓶颈原理的视角分析前馈式3DGS框架,提出ZPressor——一种轻量级架构无关模块,能够将多视角输入高效压缩为紧凑的潜在状态$Z$,在保留场景关键信息的同时剔除冗余。具体而言,ZPressor通过将视角划分为锚点集与支撑集,并利用交叉注意力机制将支撑视角的信息压缩至锚点视角,形成压缩潜在状态$Z$,使现有前馈式3DGS模型能够在80GB GPU上扩展至处理超过100个480P分辨率的输入视角。实验表明,在DL3DV-10K和RealEstate10K两个大规模基准数据集上,将ZPressor集成至多个先进前馈式3DGS模型中,能持续提升中等输入视角下的性能,并增强密集视角设置下的鲁棒性。视频结果、代码及训练模型已发布于项目页面:https://lhmd.top/zpressor。