Object detection is a fundamental task in computer vision and image understanding, with the goal of identifying and localizing objects of interest within an image while assigning them corresponding class labels. Traditional methods, which relied on handcrafted features and shallow models, struggled with complex visual data and showed limited performance. These methods combined low-level features with contextual information and lacked the ability to capture high-level semantics. Deep learning, especially Convolutional Neural Networks (CNNs), addressed these limitations by automatically learning rich, hierarchical features directly from data. These features include both semantic and high-level representations essential for accurate object detection. This paper reviews object detection frameworks, starting with classical computer vision methods. We categorize object detection approaches into two groups: (1) classical computer vision techniques and (2) CNN-based detectors. We compare major CNN models, discussing their strengths and limitations. In conclusion, this review highlights the significant advancements in object detection through deep learning and identifies key areas for further research to improve performance.
翻译:目标检测是计算机视觉与图像理解领域的一项基础任务,其目标是在图像中识别并定位感兴趣的目标,同时为它们分配相应的类别标签。传统方法依赖于手工设计的特征和浅层模型,难以处理复杂的视觉数据,性能表现有限。这些方法将底层特征与上下文信息相结合,但缺乏捕捉高层语义的能力。深度学习,特别是卷积神经网络(CNNs),通过直接从数据中自动学习丰富、层次化的特征,解决了这些局限性。这些特征包含了语义和高层表示,对于准确的目标检测至关重要。本文回顾了目标检测框架,从经典计算机视觉方法开始。我们将目标检测方法分为两类:(1)经典计算机视觉技术;(2)基于CNN的检测器。我们比较了主要的CNN模型,讨论了它们的优势与局限。最后,本综述强调了深度学习在目标检测领域取得的重大进展,并指出了为进一步提升性能而需深入研究的关键方向。