In this paper, we study the technical problem of developing conversational agents that can quickly adapt to unseen tasks, learn task-specific communication tactics, and help listeners finish complex, temporally extended tasks. We find that the uncertainty of language learning can be decomposed to an entropy term and a mutual information term, corresponding to the structural and functional aspect of language, respectively. Combined with reinforcement learning, our method automatically requests human samples for training when adapting to new tasks and learns communication protocols that are succinct and helpful for task completion. Human and simulation test results on a referential game and a 3D navigation game prove the effectiveness of the proposed method.
翻译:本文研究了如何开发能够快速适应未见任务、学习特定任务沟通策略,并帮助听众完成复杂、长周期任务的对话代理技术问题。我们发现语言学习的不确定性可分解为熵项和互信息项,分别对应语言的结构层面和功能层面。结合强化学习,我们的方法在适应新任务时能自动请求人类样本进行训练,并学习简洁且有助于任务完成的沟通协议。在参照游戏和3D导航游戏上的人工及仿真测试结果证明了所提方法的有效性。