Unlike most previous HOI methods that focus on learning better human-object features, we propose a novel and complementary approach called category query learning. Such queries are explicitly associated to interaction categories, converted to image specific category representation via a transformer decoder, and learnt via an auxiliary image-level classification task. This idea is motivated by an earlier multi-label image classification method, but is for the first time applied for the challenging human-object interaction classification task. Our method is simple, general and effective. It is validated on three representative HOI baselines and achieves new state-of-the-art results on two benchmarks.
翻译:不同于以往大多数HOI方法专注于学习更优的人-物特征,我们提出一种新颖且互补的方法——类别查询学习。这些查询显式关联到交互类别,通过Transformer解码器转换为图像特定的类别表征,并借助辅助的图像级分类任务进行学习。该思想源于早期的多标签图像分类方法,但首次应用于挑战性的人-物交互分类任务。我们的方法简单、通用且有效,在三个代表性HOI基线上得到验证,并在两个基准上取得了最新的最优结果。