If AI is the new electricity, what should we do to keep ourselves from getting electrocuted? In this work, we explore factors related to the potential of large language models (LLMs) to manipulate human decisions. We describe the results of two experiments designed to determine what characteristics of humans are associated with their susceptibility to LLM manipulation, and what characteristics of LLMs are associated with their manipulativeness potential. We explore human factors by conducting user studies in which participants answer general knowledge questions using LLM-generated hints, whereas LLM factors by provoking language models to create manipulative statements. Then, we analyze their obedience, the persuasion strategies used, and the choice of vocabulary. Based on these experiments, we discuss two actions that can protect us from LLM manipulation. In the long term, we put AI literacy at the forefront, arguing that educating society would minimize the risk of manipulation and its consequences. We also propose an ad hoc solution, a classifier that detects manipulation of LLMs - a Manipulation Fuse.
翻译:如果说人工智能是新时代的电力,我们该如何避免触电?本研究探讨了与大型语言模型(LLM)操纵人类决策潜力相关的因素。我们描述了两项实验的结果,旨在确定人类的哪些特质与其易受LLM操纵相关,以及LLM的哪些特性与其操纵潜力相关。我们通过开展用户研究来探究人为因素,参与者需借助LLM生成的提示回答常识性问题;同时通过刺激语言模型生成操纵性陈述来探究LLM因素。随后,我们分析了参与者的顺从度、模型使用的说服策略及其词汇选择。基于这些实验,我们讨论了两种可保护我们免受LLM操纵的措施。从长远来看,我们将人工智能素养置于首位,主张通过社会教育来最小化操纵风险及其后果。同时,我们提出一种临时解决方案——一种能检测LLM操纵行为的分类器,即“操纵保险丝”。