Teaching Models to Balance Resisting and Accepting Persuasion

Large language models (LLMs) are susceptible to persuasion, which can pose risks when models are faced with an adversarial interlocutor. We take a first step towards defending models against persuasion while also arguing that defense against adversarial (i.e. negative) persuasion is only half of the equation: models should also be able to accept beneficial (i.e. positive) persuasion to improve their answers. We show that optimizing models for only one side results in poor performance on the other. In order to balance positive and negative persuasion, we introduce Persuasion-Balanced Training (or PBT), which leverages multi-agent recursive dialogue trees to create data and trains models via preference optimization to accept persuasion when appropriate. PBT consistently improves resistance to misinformation and resilience to being challenged while also resulting in the best overall performance on holistic data containing both positive and negative persuasion. Crucially, we show that PBT models are better teammates in multi-agent debates. We find that without PBT, pairs of stronger and weaker models have unstable performance, with the order in which the models present their answers determining whether the team obtains the stronger or weaker model's performance. PBT leads to better and more stable results and less order dependence, with the stronger model consistently pulling the weaker one up.

翻译：大型语言模型（LLMs）易受说服影响，当模型面对对抗性对话者时可能带来风险。我们迈出了保护模型免受说服影响的第一步，同时指出防御对抗性（即负面）说服仅是问题的一半：模型还应能够接受有益（即正面）的说服以改进其回答。我们证明，仅针对单一方面优化模型会导致另一方面的表现不佳。为了平衡正面与负面说服，我们引入了说服平衡训练（Persuasion-Balanced Training，简称PBT），该方法利用多智能体递归对话树生成数据，并通过偏好优化训练模型在适当时接受说服。PBT持续提升了对错误信息的抵抗力和面对质疑的韧性，同时在包含正面与负面说服的整体数据上实现了最佳综合性能。关键的是，我们发现PBT模型在多智能体辩论中是更优秀的协作者。研究表明，未经PBT训练时，强弱模型组合的表现不稳定，模型回答的顺序决定了团队获得较强还是较弱模型的性能。PBT带来了更好、更稳定的结果及更低的顺序依赖性，使较强模型能够持续提升较弱模型的表现。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日