Predicting the next action that a human is most likely to perform is key to human-AI collaboration and has consequently attracted increasing research interests in recent years. An important factor for next action prediction are human intentions: If the AI agent knows the intention it can predict future actions and plan collaboration more effectively. Existing Bayesian methods for this task struggle with complex visual input while deep neural network (DNN) based methods do not provide uncertainty quantifications. In this work we combine both approaches for the first time and show that the predicted next action probabilities contain information that can be used to infer the underlying intention. We propose a two-step approach to human intention prediction: While a DNN predicts the probabilities of the next action, MCMC-based Bayesian inference is used to infer the underlying intention from these predictions. This approach not only allows for independent design of the DNN architecture but also the subsequently fast, design-independent inference of human intentions. We evaluate our method using a series of experiments on the Watch-And-Help (WAH) and a keyboard and mouse interaction dataset. Our results show that our approach can accurately predict human intentions from observed actions and the implicit information contained in next action probabilities. Furthermore, we show that our approach can predict the correct intention even if only few actions have been observed.
翻译:预测人类最可能执行的下一动作是人机协作的关键,近年来已吸引越来越多的研究兴趣。对人类意图的认知是下一动作预测的重要因素:若AI智能体知晓意图,便能更有效地预测未来动作并规划协作。现有贝叶斯方法在处理复杂视觉输入时存在困难,而基于深度神经网络(DNN)的方法无法提供不确定性量化。本研究首次将两种方法相结合,表明预测的下一动作概率中包含可用于推断潜在意图的信息。我们提出一种两步式人类意图预测方法:由DNN预测下一动作概率,再基于MCMC的贝叶斯推断从这些预测中推断潜在意图。该方法不仅支持DNN架构的独立设计,还能实现后续快速且与架构无关的人类意图推断。通过在Watch-And-Help(WAH)数据集及键盘鼠标交互数据集上开展系列实验,我们验证了该方法能从观测动作及下一动作概率中隐含信息准确预测人类意图。此外,研究结果表明,即便仅观测到少量动作,该方法仍能预测正确意图。