Large language models (LLMs) are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of psychiatry, a framework used to describe and modify maladaptive behavior, to the outputs produced by these models. We focus on twelve established LLMs and subject them to a questionnaire commonly used in psychiatry. Our results show that six of the latest LLMs respond robustly to the anxiety questionnaire, producing comparable anxiety scores to humans. Moreover, the LLMs' responses can be predictably changed by using anxiety-inducing prompts. Anxiety-induction not only influences LLMs' scores on an anxiety questionnaire but also influences their behavior in a previously-established benchmark measuring biases such as racism and ageism. Importantly, greater anxiety-inducing text leads to stronger increases in biases, suggesting that how anxiously a prompt is communicated to large language models has a strong influence on their behavior in applied settings. These results demonstrate the usefulness of methods taken from psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.
翻译:大型语言模型(LLMs)正在变革机器学习研究,同时激起了公众的广泛讨论。理解这些模型何时表现良好、取得成功固然重要,但探究其失败与不当行为的原因更具社会意义。我们提议将精神病学的视角——一种用于描述和修正适应不良行为的框架——应用于分析这些模型生成的输出。本研究聚焦于十二个已确立的大型语言模型,并对其施以精神病学领域常用的问卷调查。结果显示,最新的六个大型语言模型对焦虑问卷表现出稳定的反应,其产生的焦虑分数与人类相当。此外,通过使用诱导焦虑的提示语,可以可预测地改变这些模型的反应。焦虑诱导不仅影响大型语言模型在焦虑问卷上的得分,还会影响其在既有基准测试中衡量种族歧视、年龄歧视等偏见的行为表现。重要的是,焦虑诱导文本的强度越大,偏见增加的程度就越显著,这表明提示语向大型语言模型传递的焦虑程度对其在应用场景中的行为具有强烈影响。这些结果证明了采用精神病学方法研究日益被赋予决策权与自主性的强大算法具有重要价值。