LLMs have shown promise in replicating human-like behavior in crowdsourcing tasks that were previously thought to be exclusive to human abilities. However, current efforts focus mainly on simple atomic tasks. We explore whether LLMs can replicate more complex crowdsourcing pipelines. We find that modern LLMs can simulate some of crowdworkers' abilities in these "human computation algorithms," but the level of success is variable and influenced by requesters' understanding of LLM capabilities, the specific skills required for sub-tasks, and the optimal interaction modality for performing these sub-tasks. We reflect on human and LLMs' different sensitivities to instructions, stress the importance of enabling human-facing safeguards for LLMs, and discuss the potential of training humans and LLMs with complementary skill sets. Crucially, we show that replicating crowdsourcing pipelines offers a valuable platform to investigate (1) the relative strengths of LLMs on different tasks (by cross-comparing their performances on sub-tasks) and (2) LLMs' potential in complex tasks, where they can complete part of the tasks while leaving others to humans.
翻译:LLM已在众包任务中展现出复现人类行为的能力,这些任务曾被认为专属人类能力。然而,当前研究主要聚焦于简单单一任务。本文探究LLM能否复现更复杂的众包流水线。我们发现,现代LLM能模拟部分众包工作者在这些"人计算算法"中的能力,但成功程度存在差异,且受请求者对LLM能力的理解、子任务所需的特定技能以及执行这些子任务的最佳交互模式影响。我们反思人类与LLM对指令的不同敏感性,强调为LLM启用面向人类的安全防护措施的重要性,并探讨训练人类与LLM形成互补技能集的潜力。关键的是,我们表明复现众包流水线为研究以下问题提供了宝贵平台:(1)LLM在不同任务上的相对优势(通过交叉比较其在子任务上的表现);(2)LLM在复杂任务中的潜力,即它们能完成部分任务,而将其他部分留给人类。