In collaborative goal-oriented settings, the participants are not only interested in achieving a successful outcome, but do also implicitly negotiate the effort they put into the interaction (by adapting to each other). In this work, we propose a challenging interactive reference game that requires two players to coordinate on vision and language observations. The learning signal in this game is a score (given after playing) that takes into account the achieved goal and the players' assumed efforts during the interaction. We show that a standard Proximal Policy Optimization (PPO) setup achieves a high success rate when bootstrapped with heuristic partner behaviors that implement insights from the analysis of human-human interactions. And we find that a pairing of neural partners indeed reduces the measured joint effort when playing together repeatedly. However, we observe that in comparison to a reasonable heuristic pairing there is still room for improvement -- which invites further research in the direction of cost-sharing in collaborative interactions.
翻译:在协作性目标导向的场景中,参与者不仅关注达成成功结果,还会通过相互适应来隐式协商各自在交互中投入的努力。本研究提出了一个具有挑战性的交互式参考博弈,要求两个智能体在视觉与语言观察上进行协调。该博弈的学习信号是依据游戏结束后获得的分数,该分数综合考量了所实现的目标以及玩家在交互过程中假定的努力投入。我们证明,当采用基于人类交互行为分析启发的启发式伙伴行为进行引导时,标准的近端策略优化(PPO)框架能实现较高的成功率。同时我们发现,神经网络伙伴对在重复协作时确实能降低测量到的联合努力程度。然而,与合理的启发式配对相比,仍存在改进空间——这为协作交互中成本分摊方向的进一步研究提供了契机。