Recent advances in diffusion models have enabled 3D generation from a single image. However, current methods often produce suboptimal results for novel views, with blurred textures and deviations from the reference image, limiting their practical applications. In this paper, we introduce HiFi-123, a method designed for high-fidelity and multi-view consistent 3D generation. Our contributions are twofold: First, we propose a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods. Second, capitalizing on the RGNV, we present a novel Reference-Guided State Distillation (RGSD) loss. When incorporated into the optimization-based image-to-3D pipeline, our method significantly improves 3D generation quality, achieving state-of-the-art performance. Comprehensive evaluations demonstrate the effectiveness of our approach over existing methods, both qualitatively and quantitatively. Video results are available on the project page.
翻译:近年来,扩散模型的进步使得从单张图像生成三维内容成为可能。然而,现有方法在新视角生成中往往效果欠佳,存在纹理模糊、与参考图像偏差等问题,限制了其实际应用。本文提出HiFi-123方法,旨在实现高保真且多视角一致的三维生成。我们的贡献包括两方面:首先,提出一种参考引导的新视角增强技术(RGNV),显著提升了基于扩散的零样本新视角合成方法的保真度;其次,基于RGNV技术,提出一种新颖的参考引导状态蒸馏(RGSD)损失函数。将其融入基于优化的图像到三维生成流程后,我们的方法显著提升了三维生成质量,达到了最先进水平。全面的定性与定量评估证明了本方法相较于现有技术的有效性。视频结果可查阅项目页面。