Assessing and Understanding Creativity in Large Language Models

In the field of natural language processing, the rapid development of large language model (LLM) has attracted more and more attention. LLMs have shown a high level of creativity in various tasks, but the methods for assessing such creativity are inadequate. The assessment of LLM creativity needs to consider differences from humans, requiring multi-dimensional measurement while balancing accuracy and efficiency. This paper aims to establish an efficient framework for assessing the level of creativity in LLMs. By adapting the modified Torrance Tests of Creative Thinking, the research evaluates the creative performance of various LLMs across 7 tasks, emphasizing 4 criteria including Fluency, Flexibility, Originality, and Elaboration. In this context, we develop a comprehensive dataset of 700 questions for testing and an LLM-based evaluation method. In addition, this study presents a novel analysis of LLMs' responses to diverse prompts and role-play situations. We found that the creativity of LLMs primarily falls short in originality, while excelling in elaboration. Besides, the use of prompts and the role-play settings of the model significantly influence creativity. Additionally, the experimental results also indicate that collaboration among multiple LLMs can enhance originality. Notably, our findings reveal a consensus between human evaluations and LLMs regarding the personality traits that influence creativity. The findings underscore the significant impact of LLM design on creativity and bridges artificial intelligence and human creativity, offering insights into LLMs' creativity and potential applications.

翻译：在自然语言处理领域，大型语言模型（LLM）的快速发展引起了越来越多的关注。LLM在各种任务中展现出高水平的创造力，但评估这种创造力的方法尚不充分。LLM创造力的评估需要考虑与人类的差异，需要在多维度测量的同时平衡准确性和效率。本文旨在建立一个高效的框架来评估LLM的创造力水平。通过改编版托伦斯创造性思维测试，研究评估了多种LLM在7项任务上的创造性表现，强调流畅性、灵活性、原创性和精细化四个标准。在此背景下，我们开发了一个包含700个问题的综合测试数据集，并提出了一种基于LLM的评估方法。此外，本研究还对LLM对不同提示和角色扮演情境的响应进行了新颖分析。我们发现LLM的创造力主要在原创性方面有所欠缺，而在精细化方面表现出色。此外，提示的使用和模型的角色扮演设置显著影响创造力。实验结果表明，多个LLM之间的协作可以增强原创性。值得注意的是，我们的研究揭示了人类评估与LLM在影响创造力的人格特质方面存在一致性。这些发现强调了LLM设计对创造力的重要影响，并连接了人工智能与人类创造力，为LLM的创造力及其潜在应用提供了见解。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日