Abductive reasoning aims to make the most likely inference for a given set of incomplete observations. In this paper, we introduce a novel research task known as "abductive action inference" which addresses the question of which actions were executed by a human to reach a specific state shown in a single snapshot. The research explores three key abductive inference problems: action set prediction, action sequence prediction, and abductive action verification. To tackle these challenging tasks, we investigate various models, including established ones such as Transformers, Graph Neural Networks, CLIP, BLIP, GPT3, end-to-end trained Slow-Fast, Resnet50-3D, and ViT models. Furthermore, the paper introduces several innovative models tailored for abductive action inference, including a relational graph neural network, a relational bilinear pooling model, a relational rule-based inference model, a relational GPT-3 prompt method, and a relational Transformer model. Notably, the newly proposed object-relational bilinear graph encoder-decoder (BiGED) model emerges as the most effective among all methods evaluated, demonstrating good proficiency in handling the intricacies of the Action Genome dataset. The contributions of this research offer significant progress toward comprehending the implications of human actions and making highly plausible inferences concerning the outcomes of these actions.
翻译:反事实推理旨在针对一组不完整的观测结果做出可能性最大的推断。本文提出一项名为“反事实动作推断”的新型研究任务,该任务旨在回答"人类执行了哪些动作以到达单张快照中所呈现的特定状态"这一问题。研究探索了三个关键的反事实推理问题:动作集合预测、动作序列预测以及反事实动作验证。为应对这些挑战性任务,我们研究了多种模型,包括已成熟的Transformer、图神经网络、CLIP、BLIP、GPT3、端到端训练的Slow-Fast、Resnet50-3D和ViT模型。此外,本文针对反事实动作推断提出了若干创新模型,包括关系图神经网络、关系双线性池化模型、基于关系规则的推理模型、关系GPT-3提示方法以及关系Transformer模型。值得注意的是,新提出的对象关系双线性图编码-解码器(BiGED)模型在所有评估方法中表现最优,展现出处理Action Genome数据集复杂性的良好能力。本研究的贡献在于推动了对人类行为含义的理解,并对这些行为的结果做出了高度合理的推断。