We introduce the \method, an ultra-efficient approach for monocular 3D object reconstruction. Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images. We apply Gaussian Splatting to monocular reconstruction by learning a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS. Our main innovation is the surprisingly straightforward design of this network, which, using 2D operators, maps the input image to one 3D Gaussian per pixel. The resulting set of Gaussians thus has the form an image, the Splatter Image. We further extend the method take several images as input via cross-view attention. Owning to the speed of the renderer (588 FPS), we use a single GPU for training while generating entire images at each iteration to optimize perceptual metrics like LPIPS. On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works. Code, models, demo and more results are available at https://szymanowiczs.github.io/splatter-image.
翻译:我们提出了\method方法,一种用于单目3D物体重建的超高效方法。Splatter Image基于高斯泼溅(Gaussian Splatting)技术,该技术能够从多张图像实现快速且高质量的3D场景重建。通过训练一个神经网络,我们将其应用于单目重建:在测试阶段,该网络以前馈方式执行重建,速度达38 FPS。我们的主要创新在于该网络极为简洁的设计——它利用2D算子将输入图像直接映射为每个像素对应的一个3D高斯体。由此生成的高斯体集合自然构成了一幅图像,即Splatter Image。我们进一步通过跨视图注意力机制扩展了该方法,使其能处理多张输入图像。得益于渲染器的高速性能(588 FPS),我们仅需单块GPU进行训练,并在每次迭代中生成完整图像以优化感知指标(如LPIPS)。在多个合成、真实、多类别及大规模基准数据集上,本方法在PSNR、LPIPS等指标上取得更优结果,同时训练与评估速度远快于现有方法。代码、模型、演示及更多结果已发布于https://szymanowiczs.github.io/splatter-image。