Predicting the next action that a human is most likely to perform is key to human-AI collaboration and has consequently attracted increasing research interests in recent years. An important factor for next action prediction are human intentions: If the AI agent knows the intention it can predict future actions and plan collaboration more effectively. Existing Bayesian methods for this task struggle with complex visual input while deep neural network (DNN) based methods do not provide uncertainty quantifications. In this work we combine both approaches for the first time and show that the predicted next action probabilities contain information that can be used to infer the underlying intention. We propose a two-step approach to human intention prediction: While a DNN predicts the probabilities of the next action, MCMC-based Bayesian inference is used to infer the underlying intention from these predictions. This approach not only allows for independent design of the DNN architecture but also the subsequently fast, design-independent inference of human intentions. We evaluate our method using a series of experiments on the Watch-And-Help (WAH) and a keyboard and mouse interaction dataset. Our results show that our approach can accurately predict human intentions from observed actions and the implicit information contained in next action probabilities. Furthermore, we show that our approach can predict the correct intention even if only few actions have been observed.
翻译:预测人类最可能执行的下一动作是人机协作的关键,近年来这一方向吸引了越来越多的研究兴趣。影响下一动作预测的重要因素是人类意图:若人工智能体知晓意图,便能更有效地预测未来动作并规划协作。现有针对该任务的贝叶斯方法难以处理复杂视觉输入,而基于深度神经网络的方法则无法提供不确定性量化。本研究首次将两种方法相结合,证明预测的下一动作概率中包含可用于推断潜在意图的信息。我们提出了一种人类意图预测的两步方法:深度神经网络预测下一动作概率,同时基于马尔可夫链蒙特卡洛的贝叶斯推断从这些预测中推断潜在意图。该方法不仅允许独立设计深度神经网络架构,还能实现后续快速且与设计无关的人类意图推断。通过一系列在Watch-And-Help数据集及键盘鼠标交互数据集上的实验评估,结果表明我们的方法能够依据观测到的动作及下一动作概率中隐含的信息准确预测人类意图。此外,即使仅观察到少量动作,该方法仍能正确预测意图。