Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection

Visual Relationship Detection (VRD) has seen significant advancements with Transformer-based architectures recently. However, we identify two key limitations in a conventional label assignment for training Transformer-based VRD models, which is a process of mapping a ground-truth (GT) to a prediction. Under the conventional assignment, an unspecialized query is trained since a query is expected to detect every relation, which makes it difficult for a query to specialize in specific relations. Furthermore, a query is also insufficiently trained since a GT is assigned only to a single prediction, therefore near-correct or even correct predictions are suppressed by being assigned no relation as a GT. To address these issues, we propose Groupwise Query Specialization and Quality-Aware Multi-Assignment (SpeaQ). Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group. Quality-Aware Multi-Assignment further facilitates the training by assigning a GT to multiple predictions that are significantly close to a GT in terms of a subject, an object, and the relation in between. Experimental results and analyses show that SpeaQ effectively trains specialized queries, which better utilize the capacity of a model, resulting in consistent performance gains with zero additional inference cost across multiple VRD models and benchmarks. Code is available at https://github.com/mlvlab/SpeaQ.

翻译：视觉关系检测（VRD）近年来随着基于Transformer的架构取得了显著进展。然而，我们发现传统标签分配方法（即从真实值（GT）到预测结果的映射过程）在训练基于Transformer的VRD模型时存在两个关键局限。在传统分配下，由于每个查询需要检测所有关系，导致查询被训练为未专业化形式，这使其难以专注特定关系。此外，查询训练也不充分——因为一个GT仅分配给单个预测结果，导致接近正确甚至完全正确的预测会因未被分配任何关系标签而受到抑制。为解决这些问题，我们提出了群组查询专业化与质量感知多重分配（SpeaQ）。群组查询专业化通过将查询与关系划分为不相交的组别，并引导特定查询组中的查询仅关注对应关系组中的关系，从而训练专业化查询。质量感知多重分配则进一步促进训练：将GT分配给在主体、客体及其关系上都与GT高度接近的多个预测结果。实验与分析表明，SpeaQ能有效训练专业化查询，从而更好地利用模型容量，在多个VRD模型与基准测试中以零额外推理成本实现一致性的性能提升。代码发布于 https://github.com/mlvlab/SpeaQ。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日