As LLMs become increasingly prevalent, it is interesting to consider how ``creative'' these models can be. From cognitive science, creativity consists of at least two key characteristics: \emph{convergent} thinking (purposefulness to achieve a given goal) and \emph{divergent} thinking (adaptability to new environments or constraints) \citep{runco2003critical}. In this work, we introduce a framework for quantifying LLM creativity that incorporates the two characteristics. This is achieved by (1) Denial Prompting pushes LLMs to come up with more creative solutions to a given problem by incrementally imposing new constraints on the previous solution, compelling LLMs to adopt new strategies, and (2) defining and computing the NeoGauge metric which examines both convergent and divergent thinking in the generated creative responses by LLMs. We apply the proposed framework on Codeforces problems, a natural data source for collecting human coding solutions. We quantify NeoGauge for various proprietary and open-source models and find that even the most creative model, GPT-4, still falls short of demonstrating human-like creativity. We also experiment with advanced reasoning strategies (MCTS, self-correction, etc.) and observe no significant improvement in creativity. As a by-product of our analysis, we release NeoCoder dataset for reproducing our results on future models.
翻译:随着大型语言模型(LLM)日益普及,探讨这些模型能有多“创造性”成为一个有趣的问题。认知科学指出,创造力至少包含两个关键特征:\emph{聚合性思维}(为实现特定目标的目的性)和\emph{发散性思维}(适应新环境或约束的能力)\citep{runco2003critical}。本研究提出一个量化LLM创造力的框架,该框架融合了这两个特征。具体通过以下方式实现:(1)否定提示法:通过对先前解决方案逐步施加新约束,迫使LLM采用新策略,从而推动其为给定问题提出更具创造性的解决方案;(2)定义并计算NeoGauge指标:该指标从LLM生成的创造性响应中同时考察聚合性与发散性思维。我们将所提框架应用于Codeforces编程问题——这是一个收集人类编码解决方案的天然数据源。通过量化多种专有模型与开源模型的NeoGauge值,我们发现即使最具创造性的模型GPT-4,仍未能展现出类人水平的创造力。我们还尝试了高级推理策略(如MCTS、自我修正等),但未观察到创造力有显著提升。作为本研究的副产品,我们发布了NeoCoder数据集,以便在未来模型上复现我们的实验结果。