Reinforcement learning has been widely adopted to model dialogue managers in task-oriented dialogues. However, the user simulator provided by state-of-the-art dialogue frameworks are only rough approximations of human behaviour. The ability to learn from a small number of human interactions is hence crucial, especially on multi-domain and multi-task environments where the action space is large. We therefore propose to use structured policies to improve sample efficiency when learning on these kinds of environments. We also evaluate the impact of learning from human vs simulated experts. Among the different levels of structure that we tested, the graph neural networks (GNNs) show a remarkable superiority by reaching a success rate above 80% with only 50 dialogues, when learning from simulated experts. They also show superiority when learning from human experts, although a performance drop was observed, indicating a possible difficulty in capturing the variability of human strategies. We therefore suggest to concentrate future research efforts on bridging the gap between human data, simulators and automatic evaluators in dialogue frameworks.
翻译:强化学习已被广泛应用于任务型对话中的对话管理器建模。然而,当前最先进的对话框架提供的用户模拟器仅是对人类行为的粗略近似。因此,从少量人类交互中学习的能力至关重要,尤其是在动作空间庞大的多领域和多任务环境中。为此,我们提出使用结构化策略来提升在这些环境中的样本效率。同时,我们评估了从人类专家与模拟专家学习的差异。在测试的不同结构化层级中,图神经网络展现出显著优势:当从模拟专家学习时,仅需50轮对话即可达成80%以上的成功率;而从人类专家学习时虽也表现出优势,但观察到性能下降,这表明模型可能难以捕捉人类策略的多样性。因此,我们建议未来研究应聚焦于缩小对话框架中人类数据、模拟器与自动评估器之间的差距。