Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process. In this work, we delve into the potential of using multiple image prompts, instead of a single image prompt, for 3D generation. Specifically, we build on ImageDream, a novel image-prompt multi-view diffusion model, to support multi-view images as the input prompt. Our method, dubbed MultiImageDream, reveals that transitioning from a single-image prompt to multiple-image prompts enhances the performance of multi-view and 3D object generation according to various quantitative evaluation metrics and qualitative assessments. This advancement is achieved without the necessity of fine-tuning the pre-trained ImageDream multi-view diffusion model.
翻译:使用图像作为3D生成的提示比单纯使用文本提示展现出更强的性能,因为图像能为3D生成过程提供更直观的引导。本研究深入探索了使用多个图像提示(而非单个图像提示)进行3D生成的潜力。具体而言,我们基于ImageDream这一新颖的图像提示多视角扩散模型,使其支持多视角图像作为输入提示。我们的方法名为MultiImageDream,揭示了从单图像提示到多图像提示的转变,能够根据多种定量评估指标和定性评估提升多视角与3D物体生成的性能。这一进展无需微调预训练的ImageDream多视角扩散模型即可实现。