By formally defining the training processes of large language models (LLMs), which usually encompasses pre-training, supervised fine-tuning, and reinforcement learning with human feedback, within a single and unified machine learning paradigm, we can glean pivotal insights for advancing LLM technologies. This position paper delineates the parallels between the training methods of LLMs and the strategies employed for the development of agents in two-player games, as studied in game theory, reinforcement learning, and multi-agent systems. We propose a re-conceptualization of LLM learning processes in terms of agent learning in language-based games. This framework unveils innovative perspectives on the successes and challenges in LLM development, offering a fresh understanding of addressing alignment issues among other strategic considerations. Furthermore, our two-player game approach sheds light on novel data preparation and machine learning techniques for training LLMs.
翻译:通过将大语言模型的训练过程(通常涵盖预训练、监督微调和基于人类反馈的强化学习)统一纳入单一机器学习范式进行形式化定义,我们能够为推进大语言模型技术获得关键性见解。本立场论文系统阐述了大语言模型训练方法与双人博弈中智能体开发策略(该领域研究涉及博弈论、强化学习和多智能体系统)之间的对应关系。我们提出将大语言模型学习过程重新概念化为基于语言博弈的智能体学习。这一框架揭示了大语言模型发展中的成功与挑战的创新视角,为理解对齐问题及其他战略考量提供了全新思路。此外,我们的双人博弈方法为训练大语言模型的新型数据准备和机器学习技术提供了新启示。