Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

Recent DEtection TRansformer-based (DETR) models have obtained remarkable performance. Its success cannot be achieved without the re-introduction of multi-scale feature fusion in the encoder. However, the excessively increased tokens in multi-scale features, especially for about 75\% of low-level features, are quite computationally inefficient, which hinders real applications of DETR models. In this paper, we present Lite DETR, a simple yet efficient end-to-end object detection framework that can effectively reduce the GFLOPs of the detection head by 60\% while keeping 99\% of the original performance. Specifically, we design an efficient encoder block to update high-level features (corresponding to small-resolution feature maps) and low-level features (corresponding to large-resolution feature maps) in an interleaved way. In addition, to better fuse cross-scale features, we develop a key-aware deformable attention to predict more reliable attention weights. Comprehensive experiments validate the effectiveness and efficiency of the proposed Lite DETR, and the efficient encoder strategy can generalize well across existing DETR-based models. The code will be available in \url{https://github.com/IDEA-Research/Lite-DETR}.

翻译：近期基于DEtection TRansformer（DETR）的模型取得了显著性能。这一成功离不开编码器中多尺度特征融合的重新引入。然而，多尺度特征中过度增加的标记数量（尤其是约75%的低层特征）导致计算效率低下，阻碍了DETR模型的实际应用。本文提出Lite DETR——一种简单而高效的端到端目标检测框架，能在保持原始性能99%的同时，将检测头的GFLOPs有效降低60%。具体而言，我们设计了一种高效的编码器模块，以交错方式更新高层特征（对应小分辨率特征图）和低层特征（对应大分辨率特征图）。此外，为更好地融合跨尺度特征，我们开发了一种关键感知可变形注意力机制，以预测更可靠的注意力权重。综合实验验证了所提Lite DETR的有效性和高效性，且该高效编码器策略可良好泛化至现有基于DETR的模型。代码将开源在\url{https://github.com/IDEA-Research/Lite-DETR}。

相关内容

Performance

关注 3

Performance：International Symposium on Computer Performance Modeling, Measurements and Evaluation。 Explanation：计算机性能建模、测量和评估国际研讨会。 Publisher：ACM。 SIT：http://dblp.uni-trier.de/db/conf/performance/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日