We use an online experiment with a real work task to study whether workers change their behavior when they know AI will be used to judge their work instead of humans. We find that individuals produce a higher quantity of output when they are assigned an AI evaluator. However, controlling for quantity, the quality of their output is lower, regardless of whether quality is measured using humans or LLM grades. We also find that workers are more likely to use external tools, including LLMs, when they know AI is used to judge their work instead of humans. However, the increase in external tool use does not appear to explain the differences in quantity or quality across treatments.
翻译:我们通过一项包含真实工作任务的在线实验,研究当工作者得知其工作将由AI而非人类进行评判时,是否会改变自身行为。研究发现,当受试者被分配AI评估者时,其产出数量更高。然而,在控制产出数量的条件下,无论使用人类评分还是大语言模型评分来衡量质量,其产出质量均较低。我们还发现,当工作者得知工作由AI而非人类评判时,他们更倾向于使用外部工具(包括大语言模型)。但外部工具使用率的增加似乎并不能解释不同实验组在产出数量或质量上的差异。