Humans learn social skills through both imitation and social interaction. This social learning process is largely understudied by existing research on building language agents. Motivated by this gap, we propose an interactive learning method, SOTOPIA-$\pi$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings. We show that our training method allows a 7B LLM to reach the social goal completion ability of an expert model (GPT-4-based agent), while improving the safety of language agents and maintaining general QA ability on the MMLU benchmark. We also find that this training paradigm uncovers some difficulties in LLM-based evaluation of social intelligence: LLM-based evaluators overestimate the abilities of the language agents trained specifically for social interaction.
翻译:人类通过模仿与社交互动学习社交技能。现有关于构建语言体的研究在很大程度上忽视了这一社会学习过程。受此启发,我们提出一种交互式学习方法SOTOPIA-$\pi$,以提升语言体的社交智能。该方法基于行为克隆与自我强化训练,利用大型语言模型评分对社交互动数据进行筛选。实验表明,我们的训练方法能使参数量为70亿的语言体达到专家模型(基于GPT-4的智能体)的社交目标完成能力,同时提升语言体安全性,并在MMLU基准测试中保持通用问答能力。我们还发现,这种训练范式暴露了基于LLM的社交智能评估的难点:LLM评估器会高估专门针对社交互动训练的语言体的能力。