The emergence of pretrained large language models has led to the deployment of a range of social chatbots for chitchat. Although these chatbots demonstrate language ability and fluency, they are not guaranteed to be engaging and can struggle to retain users. This work investigates the development of social chatbots that prioritize user engagement to enhance retention, specifically examining the use of human feedback to efficiently develop highly engaging chatbots. The proposed approach uses automatic pseudo-labels collected from user interactions to train a reward model that can be used to reject low-scoring sample responses generated by the chatbot model at inference time. Intuitive evaluation metrics, such as mean conversation length (MCL), are introduced as proxies to measure the level of engagement of deployed chatbots. A/B testing on groups of 10,000 new daily chatbot users on the Chai Research platform shows that this approach increases the MCL by up to 70%, which translates to a more than 30% increase in user retention for a GPT-J 6B model. Future work aims to use the reward model to realise a data fly-wheel, where the latest user conversations can be used to alternately fine-tune the language model and the reward model.
翻译:预训练大语言模型的出现推动了各类社交聊天机器人的部署,以实现闲聊互动。尽管这些聊天机器人展现出语言能力和流畅性,但未必能保证用户参与度,且难以留住用户。本研究聚焦于开发以增强用户参与度为核心的社交聊天机器人,旨在提升用户留存率,具体探究如何利用人类反馈高效开发高参与度聊天机器人。所提方法采用从用户交互中收集的自动伪标签来训练奖励模型,该模型可在推理阶段拒绝聊天机器人生成的低分样本回复。本文引入直观评估指标(如平均对话长度)作为衡量已部署聊天机器人参与度的代理指标。在Chai Research平台上针对每日新增1万名聊天机器人用户的分组A/B测试表明,该方法使平均对话长度提升最高达70%,对应GPT-J 6B模型的用户留存率增长超过30%。未来工作旨在利用该奖励模型实现数据飞轮,通过交替微调语言模型和奖励模型,持续利用最新用户对话数据优化系统。