Artificial intelligence has, so far, largely automated routine tasks, but what does it mean for the future of work if Large Language Models (LLMs) show creativity comparable to humans? To measure the creativity of LLMs holistically, the current study uses 13 creative tasks spanning three domains. We benchmark the LLMs against individual humans, and also take a novel approach by comparing them to the collective creativity of groups of humans. We find that the best LLMs (Claude and GPT-4) rank in the 52nd percentile against humans, and overall LLMs excel in divergent thinking and problem solving but lag in creative writing. When questioned 10 times, an LLM's collective creativity is equivalent to 8-10 humans. When more responses are requested, two additional responses of LLMs equal one extra human. Ultimately, LLMs, when optimally applied, may compete with a small group of humans in the future of work.
翻译:迄今为止,人工智能主要实现了常规任务的自动化,但如果大型语言模型(LLMs)展现出与人类相当的创造力,这对未来工作意味着什么?为全面评估LLMs的创造力,本研究采用了涵盖三个领域的13项创造性任务。我们将LLMs与个体人类进行基准比较,并采用一种新颖方法,将其与人类群体的集体创造力进行对比。研究发现,最佳LLMs(Claude和GPT-4)相对于人类的排名位于第52百分位;总体而言,LLMs在发散性思维和问题解决方面表现优异,但在创造性写作方面稍显不足。当被询问10次时,一个LLM的集体创造力相当于8-10个人类。当要求更多回应时,LLMs每增加两个回应相当于额外增加一个人类。最终,若得到优化应用,LLMs在未来工作中可能与小型人类团队形成竞争。