"Correct answers" from the psychology of artificial intelligence

from arxiv, 52 pages (31-page main text, 21-page SI); nine visualizations (three tables and two figures in the main text, four figures in the SI); added corrections regarding the previously erroneous survey for Study 4's replication of Graham et al. (2009); preregistered OSF database is available at https://osf.io/dzp8t/

Large Language Models have vastly grown in capabilities. One proposed application of such AI systems is to support data collection in the social and cognitive sciences, where perfect experimental control is currently unfeasible and the collection of large, representative datasets is generally expensive. In this paper, we re-replicate 14 studies from the Many Labs 2 replication project with OpenAI's text-davinci-003 model, colloquially known as GPT3.5. We collected responses from the default setting of GPT3.5 by inputting each study's survey as text. Among the eight studies we could analyse, our GPT sample replicated 37.5% of the original results as well as 37.5% of the Many Labs 2 results. Unexpectedly, we could not analyse the remaining six studies as we had planned in our pre-registration. This was because for each of these six studies, GPT3.5 answered at least one of the survey questions (either a dependent variable or a condition variable) in an extremely predetermined way: an unexpected phenomenon we call the "correct answer" effect. Different runs of GPT3.5 answered nuanced questions probing political orientation, economic preference, judgement, and moral philosophy with zero or near-zero variation in responses: with the supposedly "correct answer." For example, our survey questions found the default setting of GPT3.5 to almost always self-identify as a maximally strong conservative (99.6%, N=1,030), and to always be morally deontological in opposing the hypothetical pushing of a large man in front of an incoming trolley to save the lives of five people (100%, N=1,030). Since AI models of the future may be trained on much of the same data as GPT3.5, training data from which GPT3.5 may have learned its supposedly "correct answers," our results raise concerns that a hypothetical AI-led future may in certain ways be subject to a diminished diversity of thought.

翻译：大型语言模型的能力已大幅提升。这类人工智能系统的一个拟议应用方向是辅助社会与认知科学领域的数据收集，在该领域中，目前难以实现完美的实验控制，且收集大规模代表性数据集通常成本高昂。本文中，我们使用OpenAI的text-davinci-003模型（俗称GPT3.5）对“Many Labs 2”重复性研究项目中的14项实验进行了重测。通过将每项研究的问卷以文本形式输入GPT3.5的默认设置，我们收集了其生成的回答。在可分析的8项研究中，我们的GPT样本复现了37.5%的原始结果以及37.5%的“Many Labs 2”结果。出乎意料的是，我们无法按预注册计划分析剩余6项研究。这是因为在这6项研究中，GPT3.5至少对问卷中的一项问题（因变量或条件变量）给出了极其确定的答案：一种我们称之为“正确答案”效应的意外现象。在不同运行批次中，GPT3.5对涉及政治倾向、经济偏好、判断力及道德哲学等微妙问题的回答呈现零或近乎零的变异——即给出了所谓的“正确答案”。例如，我们的问卷发现，GPT3.5的默认设置几乎总是将自身识别为极端保守主义者（99.6%，N=1,030），且始终秉持道义主义立场，反对为拯救五人生命而将一名大个子男子推上迎面驶来的电车轨道（100%，N=1,030）。由于未来的人工智能模型可能基于与GPT3.5相同的训练数据，而这些数据可能正是GPT3.5学习所谓“正确答案”的来源，我们的研究结果引发担忧：假设由人工智能主导的未来，其思想多样性可能在某种程度上受到削弱。