Abductive reasoning aims to make the most likely inference for a given set of incomplete observations. In this work, we propose a new task called abductive action inference, in which given a situation, the model answers the question `what actions were executed by the human in order to arrive in the current state?'. Given a state, we investigate three abductive inference problems: action set prediction, action sequence prediction, and abductive action verification. We benchmark several SOTA models such as Transformers, Graph neural networks, CLIP, BLIP, end-to-end trained Slow-Fast, and Resnet50-3D models. Our newly proposed object-relational BiGED model outperforms all other methods on this challenging task on the Action Genome dataset. Codes will be made available.
翻译:溯因推理旨在对一组不完整观测做出最可能的推断。本文提出一项名为"溯因动作推理"的新任务:给定某个情境,模型需回答"人类为达到当前状态执行了哪些动作"这一问题。针对给定状态,我们研究了三类溯因推理问题:动作集合预测、动作序列预测以及溯因动作验证。我们对比了多种SOTA模型(如Transformer、图神经网络、CLIP、BLIP、端到端训练的Slow-Fast及ResNet50-3D模型),并提出了新型对象关系BiGED模型。在Action Genome数据集上,该模型在此挑战性任务中优于所有其他方法。代码将开源提供。