While delegating tasks to large language models (LLMs) can save people time, there is growing evidence that offloading tasks to such models produces social costs. We use behavior in two canonical economic games to study whether people have different expectations when decisions are made by LLMs acting on their behalf instead of themselves. More specifically, we study the social appropriateness of a spectrum of possible behaviors: when LLMs divide resources on our behalf (Dictator Game and Ultimatum Game) and when they monitor the fairness of splits of resources (Ultimatum Game). We use the Krupka-Weber norm elicitation task to detect shifts in social appropriateness ratings. Results of two pre-registered and incentivized experimental studies using representative samples from the UK and US (N = 2,658) show three key findings. First, people find that offers from machines - when no acceptance is necessary - are judged to be less appropriate than when they come from humans, although there is no shift in the modal response. Second - when acceptance is necessary - it is more appropriate for a person to reject offers from machines than from humans. Third, receiving a rejection of an offer from a machine is no less socially appropriate than receiving the same rejection from a human. Overall, these results suggest that people apply different norms for machines deciding on how to split resources but are not opposed to machines enforcing the norms. The findings are consistent with offers made by machines now being viewed as having both a cognitive and emotional component.
翻译:尽管将任务委托给大型语言模型(LLMs)可以节省人们的时间,但越来越多的证据表明,将任务卸载给此类模型会产生社会成本。我们利用两个经典经济博弈中的行为来研究:当决策由代表其行事的大型语言模型而非他们自己做出时,人们是否持有不同的期望。更具体地说,我们研究了一系列可能行为的社会适当性:当大型语言模型代表我们分配资源时(独裁者博弈和最后通牒博弈),以及当它们监控资源分配的公平性时(最后通牒博弈)。我们使用Krupka-Weber规范引出任务来检测社会适当性评级的转变。两项使用来自英国和美国的代表性样本(N = 2,658)的预注册且激励性实验研究的结果显示了三个关键发现。首先,人们认为来自机器的提议——当无需接受时——比来自人类的提议更不适当,尽管众数反应没有变化。其次——当接受是必要时——一个人拒绝来自机器的提议比拒绝来自人类的提议更适当。第三,收到来自机器的提议拒绝与收到来自人类的相同拒绝在社会适当性上并无差异。总体而言,这些结果表明,人们对机器决定如何分配资源应用了不同的规范,但并不反对机器执行这些规范。这些发现与当前将机器提出的提议视为同时具有认知和情感成分的观点相一致。