We explore the self-play training procedure of large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's utterances. To win the game, both players should have sufficient knowledge about the target word and high-level reasoning ability to infer and express in this information-reserved conversation. Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by self-play in this adversarial language game (SPAG). With this goal, we select several open-source LLMs and let each act as the attacker and play with a copy of itself as the defender on an extensive range of target words. Through reinforcement learning on the game outcomes, we observe that the LLMs' performances uniformly improve on a broad range of reasoning benchmarks. Furthermore, iteratively adopting this self-play process can continuously promote LLMs' reasoning abilities. The code is at https://github.com/Linear95/SPAG.
翻译:我们探索了大语言模型(LLMs)在一种名为“对抗禁忌”的双玩家对抗语言游戏中的自博弈训练过程。在该游戏中,攻击者和防御者围绕一个仅对攻击者可见的目标词进行交流。攻击者的目标是诱导防御者在无意识状态下说出目标词,而防御者则试图从攻击者的表述中推断出目标词。为了赢得游戏,双方玩家都需要对目标词有充分的知识,并具备高水平的推理能力,以在这种信息受限的对话中进行推断和表达。因此,我们好奇于LLMs的推理能力是否可以通过在这种对抗语言游戏中进行自博弈(SPAG)得到进一步增强。基于此目标,我们选取了若干开源LLM,让每个模型扮演攻击者,并与作为防御者的自身副本在大量目标词上进行博弈。通过对游戏结果进行强化学习,我们观察到LLMs在广泛的推理基准测试中表现均得到提升。此外,迭代采用此自博弈过程可以持续提升LLMs的推理能力。代码位于 https://github.com/Linear95/SPAG。