The future of conversational agents will provide users with personalized information responses. However, a significant challenge in developing models is the lack of large-scale dialogue datasets that span multiple sessions and reflect real-world user preferences. Previous approaches rely on experts in a wizard-of-oz setup that is difficult to scale, particularly for personalized tasks. Our method, LAPS, addresses this by using large language models (LLMs) to guide a single human worker in generating personalized dialogues. This method has proven to speed up the creation process and improve quality. LAPS can collect large-scale, human-written, multi-session, and multi-domain conversations, including extracting user preferences. When compared to existing datasets, LAPS-produced conversations are as natural and diverse as expert-created ones, which stays in contrast with fully synthetic methods. The collected dataset is suited to train preference extraction and personalized response generation. Our results show that responses generated explicitly using extracted preferences better match user's actual preferences, highlighting the value of using extracted preferences over simple dialogue history. Overall, LAPS introduces a new method to leverage LLMs to create realistic personalized conversational data more efficiently and effectively than previous methods.
翻译:未来的对话智能体将能够为用户提供个性化的信息回答。然而,开发此类模型面临的一个重大挑战是缺乏覆盖多个会话、反映真实用户偏好的大规模对话数据集。以往的方法依赖于巫师之幕设置中的专家,这种设置难以扩展,尤其是针对个性化任务。我们的方法LAPS通过使用大语言模型(LLM)指导单个人类工作者生成个性化对话来解决这一问题。该方法已被证明能够加快创建过程并提高质量。LAPS能够收集大规模、人工编写、多会话、多领域的对话,包括提取用户偏好。与现有数据集相比,LAPS生成的对话在自然性和多样性上与专家创建的对话相当,这与完全合成的方法形成鲜明对比。所收集的数据集适用于训练偏好提取和个性化回答生成。我们的结果表明,显式使用提取的偏好生成的回答比仅使用简单对话历史更能匹配用户的实际偏好,突显了使用提取偏好的价值。总体而言,LAPS引入了一种新方法,利用大语言模型比以往方法更高效、更有效地创建逼真的个性化对话数据。