Humans learn social skills through both imitation and social interaction. This social learning process is largely understudied by existing research on building language agents. Motivated by this gap, we propose an interactive learning method, SOTOPIA-$\pi$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social interaction data according to large language model (LLM) ratings. We show that our training method allows a 7B LLM to reach the social goal completion ability of an expert model (GPT-4-based agent), while improving the safety of language agents and maintaining general QA ability on the MMLU benchmark. We also find that this training paradigm uncovers some difficulties in LLM-based evaluation of social intelligence: LLM-based evaluators overestimate the abilities of the language agents trained specifically for social interaction.
翻译:人类通过模仿和社交互动两种方式学习社交技能。现有构建语言体的研究对这类社交学习过程的探索仍显不足。针对这一研究空白,我们提出交互式学习方法SOTOPIA-$\pi$,用以提升语言体的社交智能。该方法基于大语言模型(LLM)评分对社交互动数据进行过滤,并采用行为克隆与自我强化训练技术。实验表明,该训练方法可使70亿参数的语言体达到专家模型(基于GPT-4的智能体)的社交目标完成能力,同时提升语言体的安全性,并在MMLU基准测试中保持通用问答能力。此外,研究发现该训练范式揭示了基于LLM的社交智能评估的固有困难:LLM评估者会高估针对社交互动专门训练的语言体的实际能力。