In this paper, we address the problem of short-term action anticipation, i.e., we want to predict an upcoming action one second before it happens. We propose to harness high-level intent information to anticipate actions that will take place in the future. To this end, we incorporate an additional goal prediction branch into our model and propose a consistency loss function that encourages the anticipated actions to conform to the high-level goal pursued in the video. In our experiments, we show the effectiveness of the proposed approach and demonstrate that our method achieves state-of-the-art results on two large-scale datasets: Assembly101 and COIN.
翻译:在本文中,我们研究了短期动作预测问题,即在一秒钟之前预测即将发生的动作。我们提出利用高层意图信息来预测未来将要发生的动作。为此,我们在模型中引入了一个额外的目标预测分支,并提出了一种一致性损失函数,该函数鼓励预测的动作与视频中追求的高层目标保持一致。通过实验,我们展示了所提出方法的有效性,并证明我们的方法在两个大规模数据集(Assembly101和COIN)上达到了最先进的性能。