This paper explores the image synthesis capabilities of GPT-4, a leading multi-modal large language model. We establish a benchmark for evaluating the fidelity of texture features in images generated by GPT-4, comprising manually painted pictures and their AI-generated counterparts. The contributions of this study are threefold: First, we provide an in-depth analysis of the fidelity of image synthesis features based on GPT-4, marking the first such study on this state-of-the-art model. Second, the quantitative and qualitative experiments fully reveals the limitations of the GPT-4 model in image synthesis. Third, we have compiled a unique benchmark of manual drawings and corresponding GPT-4-generated images, introducing a new task to advance fidelity research in AI-generated content (AIGC). The dataset is available at: \url{https://github.com/rickwang28574/DeepArt}.
翻译:本文探索了领先的多模态大语言模型GPT-4的图像合成能力。我们建立了一个用于评估GPT-4生成图像纹理特征保真度的基准,包含人工绘制图像及其对应的AI生成图像。本研究贡献有三:首先,首次基于GPT-4对图像合成特征的保真度进行了深入分析;其次,定量与定性实验充分揭示了GPT-4模型在图像合成中的局限性;第三,我们整理了一套独特的手工绘图与对应GPT-4生成图像的基准数据集,引入新任务以推进AI生成内容的保真度研究。数据集获取地址为:\url{https://github.com/rickwang28574/DeepArt}。