Generative AI has advanced the ability to render photorealistic or artistic images, yet it remains limited in a key aspect of human creativity: interpreting ambiguous shapes. This phenomenon, rooted in pareidolia, allows humans to perceive meaningful forms in random patterns such as clouds, stones, or leaves. To computationally replicate this imaginative process, we introduce Visual Retrieval-Augmented Generation (Visual-RAG), a framework that generates animal art directly from natural silhouettes. Our method retrieves structurally similar animal shapes from a curated corpus of 28,586 high-quality silhouettes and uses them as reference exemplars to guide diffusion-based generation with ControlNet and IP-Adapter. Ablation studies confirm that shape Context with RANSAC provides the most accurate alignment, while removing shape standardization reduces the inlier ratio to just 13.4\%, underscoring the importance of structural fidelity in Visual-RAG. A user study with 12 participants evaluated the outputs in terms of aesthetics, silhouette fidelity, and overall impression. Results reveal that while Visual-RAG provides plausible interpretations, challenges remain in achieving high perceptual impact. This work lays the foundation for computational pareidolia, showing how machines can contribute to the early stages of imaginative discovery.
翻译:生成式人工智能已具备生成逼真或艺术图像的能力,但在人类创造力的关键方面仍存在局限:解读模糊形状。这一根植于空想性错觉的现象,使人类能够从云朵、石头或树叶等随机图案中感知出有意义的形式。为了在计算层面复现这一想象过程,我们提出了视觉检索增强生成(Visual-RAG)框架,该系统可直接从自然轮廓生成动物艺术。我们的方法从包含28,586个高质量轮廓的精选语料库中检索结构相似的动物形状,并将其作为参考范例,通过ControlNet和IP-Adapter引导基于扩散模型的生成过程。消融实验证实,结合RANSAC的形状上下文能提供最精确的配准,而移除形状标准化后内点率骤降至13.4%,凸显了结构保真度在Visual-RAG中的重要性。一项包含12名参与者的用户研究从美学效果、轮廓保真度和整体印象三个维度评估输出结果。结果表明,尽管Visual-RAG能提供合理的解读,但在实现高感知冲击力方面仍面临挑战。本工作为计算空想性错觉奠定了基础,展示了机器如何参与想象发现过程的早期阶段。