The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT dialogues, as evidenced by Vicuna. However, due to challenges in gathering dialogues involving human participation, current endeavors like Baize and UltraChat rely on ChatGPT conducting roleplay to simulate humans based on instructions, resulting in overdependence on seeds, diminished human-likeness, limited topic diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations. Specifically, we directly target human questions extracted from genuine human-machine conversations as a learning goal and provide a novel user simulator called `Socratic'. The experimental results show our response model, `PlatoLM', achieves SoTA performance among LLaMA-based 7B models in MT-Bench. Our findings further demonstrate that our method introduces highly human-like questioning patterns and rich topic structures, which can teach the response model better than previous works in multi-round conversations.
翻译:闭源ChatGPT的卓越性能激发了其民主化进程,其中利用真实用户与ChatGPT对话的努力已取得显著进展(如Vicuna所示)。然而,由于收集涉及人类参与的对话存在挑战,当前诸如Baize和UltraChat等尝试依赖ChatGPT根据指令进行角色扮演来模拟人类,这导致了对种子数据的过度依赖、拟人程度降低、话题多样性受限以及缺乏真实的多轮对话动态。为解决上述问题,我们提出一种能更好模拟人类行为的范式,并探索在多轮对话中引入更拟人化提问的益处。具体而言,我们直接以从真实人机对话中提取的人类问题作为学习目标,并提供了一个名为`Socratic`的新型用户模拟器。实验结果表明,我们的响应模型`PlatoLM`在MT-Bench基准测试中取得了基于LLaMA的7B模型中的最优性能。我们的研究进一步证明,本方法引入了高度拟人的提问模式和丰富的主题结构,能够在多轮对话中比以往工作更有效地指导响应模型。