The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT dialogues, as evidenced by Vicuna. However, due to challenges in gathering dialogues involving human participation, current endeavors like Baize and UltraChat rely on ChatGPT conducting roleplay to simulate humans based on instructions, resulting in overdependence on seeds, diminished human-likeness, limited topic diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations. Specifically, we directly target human questions extracted from genuine human-machine conversations as a learning goal and provide a novel user simulator called `Socratic'. The experimental results show our response model, `PlatoLM', achieves SoTA performance among LLaMA-based 7B models in MT-Bench. Our findings further demonstrate that our method introduces highly human-like questioning patterns and rich topic structures, which can teach the response model better than previous works in multi-round conversations.
翻译:闭源ChatGPT的卓越性能激发了其民主化的努力,通过利用真实用户与ChatGPT的对话已取得显著进展,Vicuna便是有力例证。然而,由于收集人类参与对话存在挑战,当前如Baize和UltraChat等尝试依赖ChatGPT根据指令进行角色扮演来模拟人类,这导致了对种子数据的过度依赖、拟人化程度降低、话题多样性受限以及缺乏真实的多轮对话动态。为解决上述问题,我们提出一种能更好模拟人类行为的范式,并探索在多轮对话中引入更拟人化提问的益处。具体而言,我们直接以从真实人机对话中提取的人类提问作为学习目标,并提供了一个名为`Socratic`的新型用户模拟器。实验结果表明,我们的响应模型`PlatoLM`在MT-Bench评测中基于LLaMA的7B模型达到了最优性能。我们的研究进一步证明,该方法引入了高度拟人化的提问模式和丰富的主题结构,能在多轮对话中比以往工作更有效地指导响应模型。