In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed ($\sim$7 seconds), effectively striking a balance between quality and efficiency.
翻译:[translated abstract in Chinese]
近年来,三维高斯泼溅(3D Gaussian splatting)凭借其快速且高质量的渲染能力,已成为三维重建与生成领域的一项强大技术。针对现有方法的不足,本文提出了一种新颖的基于扩散的框架——GVGEN,旨在从文本输入高效生成三维高斯表示。我们提出了两项创新技术:(1)结构化体素表示(Structured Volumetric Representation)。首先将杂乱的三维高斯点组织为结构化形式的高斯体(GaussianVolume)。这种转换使得在由固定数量高斯点构成的体素内能够捕捉精细的纹理细节。为更好地优化这些细节的表示,我们提出了一种独特的剪枝与稠密化方法——候选池策略(Candidate Pool Strategy),通过选择性优化提升细节保真度。(2)由粗到精的生成流程(Coarse-to-fine Generation Pipeline)。为简化高斯体的生成,并使模型能生成具有精细三维几何结构的实例,我们提出了一种由粗到精的流程:首先构建基础几何结构,随后预测完整的高斯属性。与现有三维生成方法相比,我们的GVGEN框架在定性与定量评估中均展现出更优性能,同时保持快速生成速度(约7秒),有效实现了质量与效率的平衡。