The ability to pick up on language signals in an ongoing interaction is crucial for future machine learning models to collaborate and interact with humans naturally. In this paper, we present an initial study that evaluates intra-episodic feedback given in a collaborative setting. We use a referential language game as a controllable example of a task-oriented collaborative joint activity. A teacher utters a referring expression generated by a well-known symbolic algorithm (the "Incremental Algorithm") as an initial instruction and then monitors the follower's actions to possibly intervene with intra-episodic feedback (which does not explicitly have to be requested). We frame this task as a reinforcement learning problem with sparse rewards and learn a follower policy for a heuristic teacher. Our results show that intra-episodic feedback allows the follower to generalize on aspects of scene complexity and performs better than providing only the initial statement.
翻译:在持续互动中捕捉语言信号的能力对于未来机器学习模型与人类自然协作与交互至关重要。本文呈现了一项初步研究,评估协作场景中的片段内反馈。我们以指称语言游戏作为任务导向型协作联合活动的可控示例。教师使用知名符号算法(“增量算法”)生成的指代表达作为初始指令,随后监控跟随者的动作,可能通过片段内反馈(无需明确请求)进行干预。我们将此任务建模为具有稀疏奖励的强化学习问题,并为启发式教师学习跟随者策略。结果表明,片段内反馈使跟随者能够泛化场景复杂度的各个方面,且性能优于仅提供初始语句的方案。