Recently, it has been recognized that large language models demonstrate high performance on various intellectual tasks. However, few studies have investigated alignment with humans in behaviors that involve sensibility, such as aesthetic evaluation. This study investigates the performance of GPT-4 with Vision, a state-of-the-art language model that can handle image input, on the task of aesthetic evaluation of images. We employ two tasks, prediction of the average evaluation values of a group and an individual's evaluation values. We investigate the performance of GPT-4 with Vision by exploring prompts and analyzing prediction behaviors. Experimental results reveal GPT-4 with Vision's superior performance in predicting aesthetic evaluations and the nature of different responses to beauty and ugliness. Finally, we discuss developing an AI system for aesthetic evaluation based on scientific knowledge of the human perception of beauty, employing agent technologies that integrate traditional deep learning models with large language models.
翻译:近期研究表明,大语言模型在各类智力任务中展现出卓越性能。然而,涉及感性行为(如审美评估)的类人一致性研究仍属空白。本研究探究了能处理图像输入的最新语言模型GPT-4视觉版在图像审美评估任务中的表现。我们设计了两个任务:预测群体平均评估值及个体评估值。通过优化提示词与解析预测行为,系统考察了GPT-4视觉版的性能。实验结果表明,该模型在审美评估预测中表现优异,且对美与丑的响应呈现不同特性。最终,本文基于人类美感认知的科学知识,探讨了结合传统深度学习模型与大语言模型的智能体技术,构建审美评估人工智能系统的相关议题。