Large language models appear quite creative, often performing on par with the average human on creative tasks. However, research on LLM creativity has focused solely on \textit{products}, with little attention on the creative \textit{process}. Process analyses of human creativity often require hand-coded categories or exploit response times, which do not apply to LLMs. We provide an automated method to characterise how humans and LLMs explore semantic spaces on the Alternate Uses Task, and contrast with behaviour in a Verbal Fluency Task. We use sentence embeddings to identify response categories and compute semantic similarities, which we use to generate jump profiles. Our results corroborate earlier work in humans reporting both persistent (deep search in few semantic spaces) and flexible (broad search across multiple semantic spaces) pathways to creativity, where both pathways lead to similar creativity scores. LLMs were found to be biased towards either persistent or flexible paths, that varied across tasks. Though LLMs as a population match human profiles, their relationship with creativity is different, where the more flexible models score higher on creativity. Our dataset and scripts are available on \href{https://github.com/surabhisnath/Creative_Process}{GitHub}.
翻译:大型语言模型展现出显著的创造性,在创造性任务上的表现常与人类平均水平相当。然而,现有对LLM创造性的研究仅聚焦于创作\textit{产物},极少关注创造性\textit{过程}。对人类创造性过程的传统分析通常依赖于人工编码的分类方法或利用反应时间,这些方法并不适用于LLMs。本文提出一种自动化方法,用于刻画人类和LLMs在替代用途任务中探索语义空间的方式,并与言语流畅性任务中的行为进行对比。我们使用句子嵌入识别反应类别并计算语义相似度,进而生成跳跃特征图谱。实验结果证实了早期人类研究中报告的两条创造性路径:持久性路径(在少数语义空间中进行深度搜索)和灵活性路径(跨多个语义空间进行广度搜索),两种路径均能产生相近的创造性评分。研究发现LLMs倾向于特定路径(持久性或灵活性),且这种倾向随任务类型变化。虽然从总体上看LLMs的特征图谱与人类匹配,但其与创造性的关系存在差异:灵活性更高的模型在创造性评分上表现更优。我们的数据集和代码已在\href{https://github.com/surabhisnath/Creative_Process}{GitHub}开源。