This work explores a simple yet powerful lightweight adapter design for feed-forward 3D Gaussian Splatting (3DGS). Existing methods typically apply complex, architecture-specific designs on top of the generic pipeline of image feature extraction $\rightarrow$ multi-view interaction $\rightarrow$ feature decoding. However, constrained by the scale bottleneck of 3D training data and the low-pass filtering effect of deep networks, these methods still fall short in cross-domain generalization and high-frequency geometric fidelity. To address these problems, we propose AdaptSplat, which demonstrates that without complex component engineering, introducing a single adapter of only 1.5M parameters into the generic architecture is sufficient to achieve superior performance. Specifically, we design a lightweight Frequency-Preserving Adapter (FPA) that extracts direction-aware high-frequency structural priors from the shallow features of a powerful vision foundation model backbone, and seamlessly integrates them into the generic pipeline via high-frequency positional encodings and adaptive residual modulation. This effectively compensates for the high-frequency attenuation caused by over-smoothing in deep features, improving the fitting accuracy of Gaussian primitives on complex surfaces and sharp boundaries. Extensive experiments demonstrate that AdaptSplat achieves state-of-the-art feed-forward reconstruction performance on multiple standard benchmarks, with stable generalization across domains. Code available at: https://github.com/xmw666/AdaptSplat.
翻译:本文探索了一种简洁而强大的轻量级适配器设计,用于前馈式三维高斯溅射。现有方法通常在图像特征提取、多视角交互、特征解码的通用流程之上,应用复杂且依赖特定架构的设计。然而,受限于三维训练数据的规模瓶颈以及深度网络的低通滤波效应,这些方法在跨域泛化与高频几何保真度方面仍有不足。为解决这些问题,我们提出AdaptSplat,表明无需复杂的组件工程,仅需在通用架构中引入一个仅含150万参数的适配器,便足以实现卓越性能。具体而言,我们设计了一种轻量化的保频适配器,它从强大的视觉基础模型骨干网络的浅层特征中提取方向感知的高频结构先验,并通过高频位置编码与自适应残差调制,将其无缝集成至通用流程。这有效补偿了深层特征因过度平滑导致的高频衰减,提升了高斯原语在复杂曲面与尖锐边界上的拟合精度。大量实验表明,AdaptSplat在多个标准基准上实现了顶尖的前馈式重建性能,并具备稳定的跨域泛化能力。代码见:https://github.com/xmw666/AdaptSplat。