Smart Reply (SR) systems present a user with a set of replies, of which one can be selected in place of having to type out a response. To perform well at this task, a system should be able to effectively present the user with a diverse set of options, to maximise the chance that at least one of them conveys the user's desired response. This is a significant challenge, due to the lack of datasets containing sets of responses to learn from. Resultantly, previous work has focused largely on post-hoc diversification, rather than explicitly learning to predict sets of responses. Motivated by this problem, we present a novel method SimSR, that employs model-based simulation to discover high-value response sets, through simulating possible user responses with a learned world model. Unlike previous approaches, this allows our method to directly optimise the end-goal of SR--maximising the relevance of at least one of the predicted replies. Empirically on two public datasets, when compared to SoTA baselines, our method achieves up to 21% and 18% improvement in ROUGE score and Self-ROUGE score respectively.
翻译:智能回复(Smart Reply, SR)系统为用户提供一组备选回复,用户可从中选择其一,而无需手动输入完整回答。为在此任务中表现良好,系统应能有效呈现多样化的选项集,以最大化至少有一条回复传达用户预期响应的概率。由于缺乏包含回复集合的数据集以供学习,这构成了一项重大挑战。因此,先前工作主要侧重于事后多样化,而非明确学习预测回复集合。受此问题驱动,我们提出一种新方法SimSR,通过基于模型的模拟发现高价值回复集合——即利用学习到的世界模型模拟用户可能的响应。与先前方法不同,该方法能直接优化SR的最终目标:最大化至少一条预测回复的相关性。在两个公开数据集上的实证表明,与最先进(SoTA)基线相比,我们的方法在ROUGE分数和Self-ROUGE分数上分别实现了高达21%和18%的提升。