YOLO26: A Comprehensive Architecture Overview and Key Improvements

You Only Look Once (YOLO) has been the prominent model for computer vision in deep learning for a decade. This study explores the novel aspects of YOLO26, the most recent version in the YOLO series. The elimination of Distribution Focal Loss (DFL), implementation of End-to-End NMS-Free Inference, introduction of ProgLoss + Small-Target-Aware Label Assignment (STAL), and use of the MuSGD optimizer are the primary enhancements designed to improve inference speed, which is claimed to achieve a 43% boost in CPU mode. This is designed to allow YOLO26 to attain real-time performance on edge devices or those without GPUs. Additionally, YOLO26 offers improvements in many computer vision tasks, including instance segmentation, pose estimation, and oriented bounding box (OBB) decoding. We aim for this effort to provide more value than just consolidating information already included in the existing technical documentation. Therefore, we performed a rigorous architectural investigation into YOLO26, mostly using the source code available in its GitHub repository and its official documentation. The authentic and detailed operational mechanisms of YOLO26 are inside the source code, which is seldom extracted by others. The YOLO26 architectural diagram is shown as the outcome of the investigation. This study is, to our knowledge, the first one presenting the CNN-based YOLO26 architecture, which is the core of YOLO26. Our objective is to provide a precise architectural comprehension of YOLO26 for researchers and developers aspiring to enhance the YOLO model, ensuring it remains the leading deep learning model in computer vision.

翻译：YOLO（You Only Look Once）作为深度学习计算机视觉领域的代表性模型已历经十年发展。本研究深入探讨YOLO系列最新版本YOLO26的创新特性。通过消除分布焦点损失（DFL）、实现端到端非极大值抑制（NMS）自由推理、引入渐进式损失与小型目标感知标签分配（STAL）策略、以及采用MuSGD优化器等核心改进，该版本显著提升了推理速度——据称在CPU模式下可实现43%的性能提升。这些设计旨在使YOLO26能够在边缘设备或无GPU环境中实现实时性能。此外，YOLO26在实例分割、姿态估计、定向边界框（OBB）解码等多项计算机视觉任务中均有性能提升。本研究不仅整合现有技术文档信息，更致力于提供更高学术价值。为此，我们基于GitHub开源代码库与官方文档，对YOLO26进行了严谨的架构解析。源代码中蕴含的精确运行机制往往未被充分挖掘，而本研究首次系统呈现基于卷积神经网络（CNN）的YOLO26核心架构图。研究成果旨在为致力于改进YOLO模型的研究者与开发者提供精准的架构认知，确保该模型持续保持计算机视觉深度学习领域的领先地位。

相关内容

Yolo

关注 28

Yolo算法，其全称是You Only Look Once: Unified, Real-Time Object Detection,You Only Look Once说的是只需要一次CNN运算，Unified指的是这是一个统一的框架，提供end-to-end的预测，而Real-Time体现是Yolo算法速度快。

MiniMax震撼开源，突破传统Transformer架构，4560亿参数，支持400万长上下文

专知会员服务

21+阅读 · 2025年1月15日

YOLOv1 到 YOLOv10：最快且最准确的实时目标检测系统

专知会员服务

42+阅读 · 2024年8月22日