Reinforcement learning has been applied to train the dialog systems in many works. Previous approaches divide the dialog system into multiple modules including DST (dialog state tracking) and DP (dialog policy), and train these modules simultaneously. However, different modules influence each other during training. The errors from DST might misguide the dialog policy, and the system action brings extra difficulties for the DST module. To alleviate this problem, we propose Asynchronous Updating Reinforcement Learning framework (AURL) that updates the DST module and the DP module asynchronously under a cooperative setting. Furthermore, curriculum learning is implemented to address the problem of unbalanced data distribution during reinforcement learning sampling, and multiple user models are introduced to increase the dialog diversity. Results on the public SSD-PHONE dataset show that our method achieves a compelling result with a 31.37% improvement on the dialog success rate. The code is publicly available via https://github.com/shunjiu/AURL.
翻译:强化学习已被用于许多工作中的对话系统训练。以往方法将对话系统划分为多个模块,包括DST(对话状态跟踪)和DP(对话策略),并同步训练这些模块。然而,不同模块在训练过程中会相互影响。DST产生的误差可能误导对话策略,而系统动作为DST模块带来额外困难。为缓解这一问题,我们提出异步更新强化学习框架(AURL),该框架在协作设置下异步更新DST模块和DP模块。此外,我们引入课程学习来解决强化学习采样过程中数据分布不平衡的问题,并采用多个用户模型以增加对话多样性。在公开SSD-PHONE数据集上的实验结果表明,我们的方法取得了令人瞩目的成果,对话成功率提升31.37%。代码已通过https://github.com/shunjiu/AURL公开提供。