An effective multi-turn instruction-following assistant can be developed by creating a simulator that can generate useful interaction data. Apart from relying on its intrinsic weights, an ideal user simulator should also be able to bootstrap external knowledge rapidly in its raw form to simulate the multifarious diversity of text available over the internet. Previous user simulators generally lacked diversity, were mostly closed domain, and necessitated rigid schema making them inefficient to rapidly scale to incorporate external knowledge. In this regard, we introduce, Kaucus, a Knowledge-Augmented User Simulator framework, to outline a process of creating diverse user simulators, that can seamlessly exploit external knowledge as well as benefit downstream assistant model training. Through two GPT-J based simulators viz., a Retrieval Augmented Simulator and a Summary Controlled Simulator we generate diverse simulator-assistant interactions. Through reward and preference model-based evaluations, we find that these interactions serve as useful training data and create more helpful downstream assistants. We also find that incorporating knowledge through retrieval augmentation or summary control helps create better assistants.
翻译:构建有效的多轮指令遵循助手可通过创建能生成有用交互数据的模拟器实现。理想的用户模拟器除了依赖自身权重外,还应能快速以原始形式引导外部知识,以模拟互联网上文本的多样性。先前用户模拟器普遍缺乏多样性,大多限定领域,且需要僵化的模式,难以快速扩展以整合外部知识。为此,我们提出KAUCUS——知识增强型用户模拟器框架,系统描述了创建多样化用户模拟器的流程,该框架不仅能无缝利用外部知识,还能有益于下游助手模型训练。通过两个基于GPT-J的模拟器(即检索增强模拟器与摘要控制模拟器),我们生成了多样化的模拟器-助手交互数据。基于奖励模型与偏好模型的评估显示,这些交互数据可作为有效的训练数据,有助于构建更实用下游助手。研究还发现,通过检索增强或摘要控制引入知识,能显著提升助手的性能表现。