Can ChatGPT Defend the Truth? Automatic Dialectical Evaluation Elicits LLMs' Deficiencies in Reasoning

We explore testing the reasoning ability of large language models (LLMs), such as ChatGPT, by engaging with them in a debate-like conversation that probes deeper into their understanding of the subject. Specifically, we formulate a new task where given a question, the LLM can generate a correct solution while the user believes in a wrong solution in the beginning, and they need to discuss to make the correct decision through dialogue. Such a setting requires the LLM to not only achieve the correct answer on its own (which could be done by shallow memorization), but also be able to defend the truth instead of blindly believing or getting misled by the user's (invalid) arguments and critiques, thus testing in greater depth whether the LLM grasps the essence of the reasoning required to solve the problem. To automate this evaluation framework and save human labor, we simulate the user using another LLM conditioned on a synthesized wrong solution. Across a range of complex reasoning benchmarks spanning math, commonsense, logic and tasks from BIG-Bench, we find that despite being able to generate correct step-by-step solutions in the beginning, ChatGPT cannot maintain its belief in truth for a significant portion of examples when challenged by often-time absurdly invalid arguments. Our work reveals LLMs' weaknesses not captured by conventional benchmarking, and also points to danger zones of aligning models with human feedback.

翻译：我们探索通过与大型语言模型（如ChatGPT）进行辩论式对话来测试其推理能力，这种对话能更深入地探究模型对主题的理解。具体而言，我们设计了一项新任务：给定一个问题，大语言模型能生成正确答案，而用户最初相信一个错误答案，双方需通过对话讨论最终做出正确决策。该设置要求模型不仅独立得出正确答案（这可通过浅层记忆实现），更能在面对用户（无效）论点与批评时捍卫真理，而非盲目相信或被误导，从而更深度检验模型是否真正掌握解决问题所需的推理本质。为自动化该评估框架并节省人力，我们利用另一个大语言模型基于综合生成的错误答案模拟用户行为。在涵盖数学、常识、逻辑及BIG-Bench任务的一系列复杂推理基准测试中，我们发现尽管ChatGPT起初能生成正确的逐步解决方案，但在面对大量经常荒谬无效的论点质疑时，相当比例的例子中其无法坚持真理信念。本研究揭示了传统基准测试未捕捉到的大语言模型弱点，同时指出了通过人类反馈对齐模型时存在的危险区域。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日