Driven by the simple and effective Dense O2O, DEIM demonstrates faster convergence and enhanced performance. In this work, we extend it with DINOv3 features, resulting in DEIMv2. DEIMv2 spans eight model sizes from X to Atto, covering GPU, edge, and mobile deployment. For the X, L, M, and S variants, we adopt DINOv3-pretrained or distilled backbones and introduce a Spatial Tuning Adapter (STA), which efficiently converts DINOv3's single-scale output into multi-scale features and complements strong semantics with fine-grained details to enhance detection. For ultra-lightweight models (Nano, Pico, Femto, and Atto), we employ HGNetv2 with depth and width pruning to meet strict resource budgets. Together with a simplified decoder and an upgraded Dense O2O, this unified design enables DEIMv2 to achieve a superior performance-cost trade-off across diverse scenarios, establishing new state-of-the-art results. Notably, our largest model, DEIMv2-X, achieves 57.8 AP with only 50.3 million parameters, surpassing prior X-scale models that require over 60 million parameters for just 56.5 AP. On the compact side, DEIMv2-S is the first sub-10 million model (9.71 million) to exceed the 50 AP milestone on COCO, reaching 50.9 AP. Even the ultra-lightweight DEIMv2-Pico, with just 1.5 million parameters, delivers 38.5 AP, matching YOLOv10-Nano (2.3 million) with around 50 percent fewer parameters. Our code and pre-trained models are available at https://github.com/Intellindust-AI-Lab/DEIMv2
翻译:受简洁高效的Dense O2O机制驱动,DEIM展现出更快的收敛速度与增强的性能。本工作通过集成DINOv3特征将其扩展为DEIMv2。DEIMv2涵盖从X到Atto的八种模型规模,适配GPU、边缘设备及移动端部署场景。针对X、L、M、S系列变体,我们采用DINOv3预训练或蒸馏主干网络,并引入空间调谐适配器(STA)。该模块能高效地将DINOv3的单尺度输出转换为多尺度特征,同时通过细粒度细节补充强语义信息以提升检测性能。对于超轻量级模型(Nano、Pico、Femto、Atto),我们采用经过深度与宽度剪枝的HGNetv2以满足严格的资源约束。结合简化解码器与升级版Dense O2O机制,这一统一设计使DEIMv2能在多样化场景中实现卓越的性能-成本平衡,创造了多项新的性能纪录。值得注意的是,我们最大的DEIMv2-X模型仅用5030万参数即达到57.8 AP,超越了先前需要超过6000万参数仅获得56.5 AP的X规模模型。在紧凑模型方面,DEIMv2-S成为首个在COCO数据集上突破50 AP里程碑的千万级以下参数模型(971万参数),达到50.9 AP。即便是仅150万参数的超轻量级DEIMv2-Pico也能实现38.5 AP,在参数减少约50%的情况下与230万参数的YOLOv10-Nano性能持平。我们的代码与预训练模型已发布于https://github.com/Intellindust-AI-Lab/DEIMv2