In this report, we present RT-DETRv2, an improved Real-Time DEtection TRansformer (RT-DETR). RT-DETRv2 builds upon the previous state-of-the-art real-time detector, RT-DETR, and opens up a set of bag-of-freebies for flexibility and practicality, as well as optimizing the training strategy to achieve enhanced performance. To improve the flexibility, we suggest setting a distinct number of sampling points for features at different scales in the deformable attention to achieve selective multi-scale feature extraction by the decoder. To enhance practicality, we propose an optional discrete sampling operator to replace the grid_sample operator that is specific to RT-DETR compared to YOLOs. This removes the deployment constraints typically associated with DETRs. For the training strategy, we propose dynamic data augmentation and scale-adaptive hyperparameters customization to improve performance without loss of speed. Source code and pre-trained models will be available at https://github.com/lyuwenyu/RT-DETR.
翻译:在本报告中,我们提出了RT-DETRv2,一种改进的实时检测Transformer(RT-DETR)。RT-DETRv2基于先前最先进的实时检测器RT-DETR,引入了一系列免费技巧以提升灵活性与实用性,并优化了训练策略以实现更强的性能。为提高灵活性,我们建议在可变形注意力机制中为不同尺度的特征设置不同数量的采样点,从而使解码器能够实现选择性的多尺度特征提取。为增强实用性,我们提出了一种可选的离散采样算子,以替代RT-DETR相较于YOLO系列所特有的grid_sample算子。这消除了通常与DETR类模型相关的部署限制。在训练策略方面,我们提出了动态数据增强和尺度自适应超参数定制方法,在不损失速度的前提下提升性能。源代码与预训练模型将在https://github.com/lyuwenyu/RT-DETR 提供。