The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.
翻译:预训练大型语言模型的出现促使了一系列社交聊天机器人的部署,用于闲聊。尽管这些机器人展现了语言能力和流畅性,但它们并不一定能保证互动性,且可能难以留住用户。本研究探讨了如何开发优先考虑用户互动以增强用户留存的社交聊天机器人,特别考察了利用人类反馈高效开发高互动性聊天机器人的方法。所提出的方法使用从用户互动中收集的自动伪标签来训练奖励模型,该模型可在推理时拒绝聊天机器人模型生成的低分样本回应。引入直观的评估指标,如平均对话长度(MCL),作为衡量已部署聊天机器人互动水平的代理指标。在Chai Research平台上对每组10,000名新日活跃聊天机器人用户进行的A/B测试表明,该方法使MCL提升了高达70%,对应GPT-J 6B模型的用户留存率增加了超过30%。未来工作旨在利用奖励模型实现数据飞轮,通过最新用户对话交替微调语言模型和奖励模型。