Model Compression Methods for YOLOv5: A Review

Over the past few years, extensive research has been devoted to enhancing YOLO object detectors. Since its introduction, eight major versions of YOLO have been introduced with the purpose of improving its accuracy and efficiency. While the evident merits of YOLO have yielded to its extensive use in many areas, deploying it on resource-limited devices poses challenges. To address this issue, various neural network compression methods have been developed, which fall under three main categories, namely network pruning, quantization, and knowledge distillation. The fruitful outcomes of utilizing model compression methods, such as lowering memory usage and inference time, make them favorable, if not necessary, for deploying large neural networks on hardware-constrained edge devices. In this review paper, our focus is on pruning and quantization due to their comparative modularity. We categorize them and analyze the practical results of applying those methods to YOLOv5. By doing so, we identify gaps in adapting pruning and quantization for compressing YOLOv5, and provide future directions in this area for further exploration. Among several versions of YOLO, we specifically choose YOLOv5 for its excellent trade-off between recency and popularity in literature. This is the first specific review paper that surveys pruning and quantization methods from an implementation point of view on YOLOv5. Our study is also extendable to newer versions of YOLO as implementing them on resource-limited devices poses the same challenges that persist even today. This paper targets those interested in the practical deployment of model compression methods on YOLOv5, and in exploring different compression techniques that can be used for subsequent versions of YOLO.

翻译：近年来，大量研究致力于提升YOLO目标检测器的性能。自其问世以来，已相继推出了八个主要版本的YOLO，旨在提高其准确性与效率。尽管YOLO的显著优势使其广泛应用于诸多领域，但在资源受限设备上部署仍面临挑战。为解决这一问题，研究者开发了多种神经网络压缩方法，主要分为三大类：网络剪枝、量化和知识蒸馏。模型压缩方法在降低内存占用和推理时间等方面取得的丰硕成果，使其成为在硬件受限的边缘设备上部署大型神经网络的理想选择（若非必需）。在本综述中，我们聚焦于剪枝和量化方法，因其具有较高的模块化程度。我们对这些方法进行了分类，并分析了将其应用于YOLOv5的实际效果。由此，我们识别了当前在通过剪枝和量化压缩YOLOv5过程中存在的不足，并为该领域的进一步探索提供了未来方向。在多个YOLO版本中，我们特别选择YOLOv5，因其在文献中兼顾了时效性与普适性，实现了优异的平衡。本文是首篇从实现角度系统综述YOLOv5剪枝与量化方法的专题论文。我们的研究同样可推广至更新版本的YOLO，因为在资源受限设备上部署这些模型仍面临至今未解的相同挑战。本文面向对YOLOv5模型压缩方法实际部署感兴趣的研究者，以及希望探索可用于后续YOLO版本的不同压缩技术的学者。