DETR introduces a simplified one-stage framework for scene graph generation (SGG) but faces challenges of sparse supervision and false negative samples. The former occurs because each image typically contains fewer than 10 relation annotations, while DETR-based SGG models employ over 100 relation queries. Each ground truth relation is assigned to only one query during training. The latter arises when one ground truth relation may have multiple queries with similar matching scores, leading to suboptimally matched queries being treated as negative samples. To address these, we propose Hydra-SGG, a one-stage SGG method featuring a Hybrid Relation Assignment. This approach combines a One-to-One Relation Assignment with an IoU-based One-to-Many Relation Assignment, increasing positive training samples and mitigating sparse supervision. In addition, we empirically demonstrate that removing self-attention between relation queries leads to duplicate predictions, which actually benefits the proposed One-to-Many Relation Assignment. With this insight, we introduce Hydra Branch, an auxiliary decoder without self-attention layers, to further enhance One-to-Many Relation Assignment by promoting different queries to make the same relation prediction. Hydra-SGG achieves state-of-the-art performance on multiple datasets, including VG150 (16.0 mR@50), Open Images V6 (50.1 weighted score), and GQA (12.7 mR@50).
翻译:DETR为场景图生成引入了一种简化的单阶段框架,但面临监督稀疏与负样本误判的挑战。前者源于每幅图像通常包含少于10个关系标注,而基于DETR的SGG模型使用超过100个关系查询,训练时每个真实关系仅分配给单一查询。后者则出现在当某个真实关系存在多个匹配分数相近的查询时,次优匹配的查询会被误判为负样本。为解决这些问题,我们提出Hydra-SGG——一种具有混合关系分配机制的单阶段SGG方法。该方法将一对一关系分配与基于交并比的一对多关系分配相结合,增加了正训练样本并缓解了监督稀疏问题。此外,我们通过实验证明:移除关系查询间的自注意力机制会导致重复预测,这反而有利于所提出的一对多关系分配。基于此洞见,我们设计了Hydra分支——一个不含自注意力层的辅助解码器,通过促使不同查询作出相同关系预测来进一步增强一对多关系分配。Hydra-SGG在VG150(16.0 mR@50)、Open Images V6(50.1加权分数)和GQA(12.7 mR@50)等多个数据集上取得了最先进的性能。