Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially resulting in detrimental outcomes. This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments. Avalon, full of misinformation and requiring sophisticated logic, manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans' recursive thinking and perspective-taking in the Avalon game, we introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research.
翻译:近期大语言模型(LLMs)的突破性进展在"LLM即智能体"领域取得了显著成功。然而,普遍存在的一个假设是LLMs处理的信息始终诚实,忽视了人类社会及AI生成内容中普遍存在的欺骗性或误导性信息。这种疏漏使得LLMs容易遭受恶意操纵,可能引发有害后果。本研究以复杂的阿瓦隆游戏为试验平台,探索LLMs在欺骗性环境中的潜力。充满虚假信息且需要复杂逻辑推理的阿瓦隆游戏,本质上表现为"思维游戏"。受人类在阿瓦隆游戏中递归思维与观点采择效能的启发,我们提出创新框架——递归沉思(ReCon),旨在增强LLMs识别和对抗欺骗信息的能力。ReCon融合了公式化沉思与精细化沉思两个过程:公式化沉思生成初始思维与言语,精细化沉思则对其进行进一步优化。此外,我们在两个过程中分别引入一阶与二阶视角转换:一阶视角使LLM智能体能推断他人心智状态,二阶视角则涉及理解他人如何感知智能体自身的心智状态。将ReCon与不同LLMs集成后,阿瓦隆游戏的大量实验结果表明,该框架无需额外微调与数据即可有效帮助LLMs辨识并规避欺骗信息。最后,我们为ReCon的有效性提供了可能的解释,并探讨了当前LLMs在安全性、推理、语言风格与格式方面的局限性,为后续研究提供潜在启示。