ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4

In recent years, large language models (LLMs) have made significant progress in natural language processing (NLP), with models like ChatGPT and GPT-4 achieving impressive capabilities in various linguistic tasks. However, training models on such a large scale is challenging, and finding datasets that match the model's scale is often difficult. Fine-tuning and training models with fewer parameters using novel methods have emerged as promising approaches to overcome these challenges. One such model is MiniGPT-4, which achieves comparable vision-language understanding to GPT-4 by leveraging novel pre-training models and innovative training strategies. However, the model still faces some challenges in image understanding, particularly in artistic pictures. A novel multimodal model called ArtGPT-4 has been proposed to address these limitations. ArtGPT-4 was trained on image-text pairs using a Tesla A100 device in just 2 hours, using only about 200 GB of data. The model can depict images with an artistic flair and generate visual code, including aesthetically pleasing HTML/CSS web pages. Furthermore, the article proposes novel benchmarks for evaluating the performance of vision-language models. In the subsequent evaluation methods, ArtGPT-4 scored more than 1 point higher than the current \textbf{state-of-the-art} model and was only 0.25 points lower than artists on a 6-point scale. Our code and pre-trained model are available at \url{https://huggingface.co/Tyrannosaurus/ArtGPT-4}.

翻译：近年来，大型语言模型（LLMs）在自然语言处理（NLP）领域取得了显著进展，ChatGPT、GPT-4等模型在各类语言任务中展现出卓越能力。然而，如此大规模模型的训练存在挑战，且匹配模型规模的数据集往往难以获取。通过新颖方法微调及训练参数更少的模型，已成为应对这些挑战的有前景方案。MiniGPT-4便是其中之一，其通过利用新型预训练模型和创新训练策略，实现了与GPT-4相当的视觉语言理解能力。但该模型在图像理解方面仍面临挑战，尤其是在艺术类图片中。为克服这些局限，本文提出了一种名为ArtGPT-4的新型多模态模型。ArtGPT-4使用Tesla A100设备，仅耗时2小时、利用约200GB数据在图像-文本对上完成训练。该模型能以艺术风格描绘图像，并生成包含美学HTML/CSS网页的视觉代码。此外，本文提出了评估视觉语言模型性能的新型基准。在后续评估方法中，ArtGPT-4在6分量表上的得分比当前最优模型高出1分以上，仅比艺术家低0.25分。我们的代码与预训练模型已开源至\url{https://huggingface.co/Tyrannosaurus/ArtGPT-4}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日