If AI is the new electricity, what should we do to keep ourselves from getting electrocuted? In this work, we explore factors related to the potential of large language models (LLMs) to manipulate human decisions. We describe the results of two experiments designed to determine what characteristics of humans are associated with their susceptibility to LLM manipulation, and what characteristics of LLMs are associated with their manipulativeness potential. We explore human factors by conducting user studies in which participants answer general knowledge questions using LLM-generated hints, whereas LLM factors by provoking language models to create manipulative statements. Then, we analyze their obedience, the persuasion strategies used, and the choice of vocabulary. Based on these experiments, we discuss two actions that can protect us from LLM manipulation. In the long term, we put AI literacy at the forefront, arguing that educating society would minimize the risk of manipulation and its consequences. We also propose an ad hoc solution, a classifier that detects manipulation of LLMs - a Manipulation Fuse.
翻译:如果人工智能是新的电力,我们该如何避免被电击?本文探讨了与大语言模型操纵人类决策潜力相关的因素。我们描述了两项实验的结果:一项旨在确定人类哪些特征与易受大语言模型操纵的倾向相关,另一项则探究大语言模型哪些特征与其操纵潜力相关。通过用户研究(参与者使用大语言模型生成的提示回答常识性问题)分析人类因素,同时通过诱导语言模型生成操纵性陈述分析大语言模型因素。随后,我们考察了用户的服从性、使用的说服策略以及词汇选择。基于这些实验,我们讨论了两项可保护我们免受大语言模型操纵的措施。长期来看,我们将人工智能素养置于首位,认为教育社会可最大限度降低操纵风险及其后果。我们还提出了一种临时解决方案——即检测大语言模型操纵的分类器,称为"操纵保险丝"。