We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployment data from BlenderBot (Xu et al., 2023). Human evaluation indicates improvements in our new models over baseline responses; however, we find that some proxy signals can lead to more generations with undesirable properties as well. For example, optimizing for conversation length can lead to more controversial or unfriendly generations compared to the baseline, whereas optimizing for positive sentiment or reaction can decrease these behaviors.
翻译:我们研究如何通过利用用户与已部署模型之间的自然对话(无需额外标注)来改进社交对话代理。为了隐式衡量机器生成话语的质量,我们利用用户回复长度、情感倾向以及收集到的对话片段中未来人类话语的反应等信号。我们的实验使用了BlenderBot(Xu等,2023)公开发布的部署数据。人工评估表明,我们的新模型在基线回复基础上有所改进;然而,我们发现某些代理信号可能导致更多具有不良属性的生成结果。例如,优化对话长度会导致相比基线生成更多具有争议性或不够友好的内容,而优化积极情感或反应则会减少此类行为。