This paper explores the image synthesis capabilities of GPT-4, a leading multi-modal large language model. We establish a benchmark for evaluating the fidelity of texture features in images generated by GPT-4, comprising manually painted pictures and their AI-generated counterparts. The contributions of this study are threefold: First, we provide an in-depth analysis of the fidelity of image synthesis features based on GPT-4, marking the first such study on this state-of-the-art model. Second, the quantitative and qualitative experiments fully reveals the limitations of the GPT-4 model in image synthesis. Third, we have compiled a unique benchmark of manual drawings and corresponding GPT-4-generated images, introducing a new task to advance fidelity research in AI-generated content (AIGC). The dataset will be available after being accepted: \url{https://github.com/rickwang28574/DeepArt}. We hope this study will fuel knowledge, scholarship, and innovation, inspiring uses that transform how we discover and understand the world of art and promote the development of AIGC while retaining respect for art.
翻译:本文探究了GPT-4这一领先多模态大语言模型的图像合成能力。我们构建了一个用于评估GPT-4生成图像纹理特征保真度的基准,包含手绘图像及其对应的AI生成图像。本研究贡献有三:第一,基于GPT-4对图像合成特征的保真度进行了深入分析,这是针对该最先进模型的首项此类研究;第二,定量与定性实验充分揭示了GPT-4模型在图像合成方面的局限性;第三,我们整理了一份独特的手绘图像及对应GPT-4生成图像的基准,引入了一项新任务以推动AI生成内容(AIGC)的保真度研究。该数据集将在录用后公开:\url{https://github.com/rickwang28574/DeepArt}。我们希望本研究能促进知识、学术与创新,激发变革我们探索和理解艺术世界的方式,并在尊重艺术的同时推动AIGC的发展。