We present Turbo3D, an ultra-fast text-to-3D system capable of generating high-quality Gaussian splatting assets in under one second. Turbo3D employs a rapid 4-step, 4-view diffusion generator and an efficient feed-forward Gaussian reconstructor, both operating in latent space. The 4-step, 4-view generator is a student model distilled through a novel Dual-Teacher approach, which encourages the student to learn view consistency from a multi-view teacher and photo-realism from a single-view teacher. By shifting the Gaussian reconstructor's inputs from pixel space to latent space, we eliminate the extra image decoding time and halve the transformer sequence length for maximum efficiency. Our method demonstrates superior 3D generation results compared to previous baselines, while operating in a fraction of their runtime.
翻译:我们提出了Turbo3D,一种超快速文本到3D系统,能够在不到一秒的时间内生成高质量的高斯溅射资产。Turbo3D采用了一个快速的4步、4视角扩散生成器和一个高效的前馈高斯重建器,两者均在潜在空间中运行。该4步、4视角生成器是一个通过新颖的双教师方法蒸馏得到的学生模型,该方法鼓励学生从多视角教师学习视角一致性,并从单视角教师学习照片真实感。通过将高斯重建器的输入从像素空间转移到潜在空间,我们消除了额外的图像解码时间,并将Transformer序列长度减半,以实现最高效率。与之前的基线方法相比,我们的方法在仅需其一小部分运行时间的情况下,展示了卓越的3D生成效果。