Rencently, Gaussian splatting has demonstrated significant success in novel view synthesis. Current methods often regress Gaussians with pixel or point cloud correspondence, linking each Gaussian with a pixel or a 3D point. This leads to the redundancy of Gaussians being used to overfit the correspondence rather than the objects represented by the 3D Gaussians themselves, consequently wasting resources and lacking accurate geometries or textures. In this paper, we introduce LeanGaussian, a novel approach that treats each query in deformable Transformer as one 3D Gaussian ellipsoid, breaking the pixel or point cloud correspondence constraints. We leverage deformable decoder to iteratively refine the Gaussians layer-by-layer with the image features as keys and values. Notably, the center of each 3D Gaussian is defined as 3D reference points, which are then projected onto the image for deformable attention in 2D space. On both the ShapeNet SRN dataset (category level) and the Google Scanned Objects dataset (open-category level, trained with the Objaverse dataset), our approach, outperforms prior methods by approximately 6.1\%, achieving a PSNR of 25.44 and 22.36, respectively. Additionally, our method achieves a 3D reconstruction speed of 7.2 FPS and rendering speed 500 FPS. The code will be released at https://github.com/jwubz123/DIG3D.
翻译:近年来,高斯溅射方法在新视角合成中取得了显著成功。现有方法通常通过像素或点云对应关系来回归高斯分布,将每个高斯与一个像素或三维点相关联。这导致高斯分布被冗余地用于过度拟合对应关系,而非三维高斯本身所表示的对象,从而浪费资源且缺乏精确的几何结构或纹理。本文提出LeanGaussian,一种将可变形Transformer中的每个查询视为一个三维高斯椭球体的新方法,打破了像素或点云对应关系的约束。我们利用可变形解码器,以图像特征作为键和值,逐层迭代地优化高斯分布。值得注意的是,每个三维高斯的中心被定义为三维参考点,随后将其投影到图像上以在二维空间中进行可变形注意力计算。在ShapeNet SRN数据集(类别级别)和Google Scanned Objects数据集(开放类别级别,使用Objaverse数据集训练)上,我们的方法分别以25.44和22.36的PSNR指标优于先前方法约6.1%。此外,我们的方法实现了7.2 FPS的三维重建速度和500 FPS的渲染速度。代码将在https://github.com/jwubz123/DIG3D发布。