Automatically generating multiview illusions is a compelling challenge, where a single piece of visual content offers distinct interpretations from different viewing perspectives. Traditional methods, such as shadow art and wire art, create interesting 3D illusions but are limited to simple visual outputs (i.e., figure-ground or line drawing), restricting their artistic expressiveness and practical versatility. Recent diffusion-based illusion generation methods can generate more intricate designs but are confined to 2D images. In this work, we present a simple yet effective approach for creating 3D multiview illusions based on user-provided text prompts or images. Our method leverages a pre-trained text-to-image diffusion model to optimize the textures and geometry of neural 3D representations through differentiable rendering. When viewed from multiple angles, this produces different interpretations. We develop several techniques to improve the quality of the generated 3D multiview illusions. We demonstrate the effectiveness of our approach through extensive experiments and showcase illusion generation with diverse 3D forms.
翻译:自动生成多视角错觉是一个引人注目的挑战,其目标是通过单一视觉内容在不同观看角度下呈现截然不同的解释。传统方法(如阴影艺术和线框艺术)虽能创造有趣的三维错觉,但仅限于简单的视觉输出(即图形-背景或线描),限制了其艺术表现力和实际应用多样性。近期基于扩散模型的错觉生成方法能够产生更复杂的设计,但仅限于二维图像。本研究提出了一种简洁而有效的方法,可根据用户提供的文本提示或图像创建三维多视角错觉。该方法利用预训练的文本到图像扩散模型,通过可微分渲染优化神经三维表示的纹理与几何结构,从而在多个视角下呈现不同的视觉解释。我们开发了多种技术以提升生成的三维多视角错觉的质量,并通过大量实验验证了方法的有效性,展示了多种三维形态下的错觉生成结果。