With the ever-growing variety of object detection approaches, this study explores a series of experiments that combine reinforcement learning (RL)-based visual attention methods with saliency ranking techniques to investigate transparent and sustainable solutions. By integrating saliency ranking for initial bounding box prediction and subsequently applying RL techniques to refine these predictions through a finite set of actions over multiple time steps, this study aims to enhance RL object detection accuracy. Presented as a series of experiments, this research investigates the use of various image feature extraction methods and explores diverse Deep Q-Network (DQN) architectural variations for deep reinforcement learning-based localisation agent training. Additionally, we focus on optimising the detection pipeline at every step by prioritising lightweight and faster models, while also incorporating the capability to classify detected objects, a feature absent in previous RL approaches. We show that by evaluating the performance of these trained agents using the Pascal VOC 2007 dataset, faster and more optimised models were developed. Notably, the best mean Average Precision (mAP) achieved in this study was 51.4, surpassing benchmarks set by RL-based single object detectors in the literature.
翻译:随着目标检测方法的日益多样化,本研究探索了一系列实验,将基于强化学习(RL)的视觉注意力方法与显著性排序技术相结合,以研究透明且可持续的解决方案。通过集成显著性排序进行初始边界框预测,并随后应用强化学习技术,在多个时间步长上通过有限的动作集来优化这些预测,本研究旨在提高基于强化学习的目标检测精度。本研究以一系列实验的形式展开,探讨了多种图像特征提取方法的使用,并研究了用于基于深度强化学习的定位智能体训练的不同深度Q网络(DQN)架构变体。此外,我们专注于通过优先考虑轻量级和更快的模型来优化检测流程的每一步,同时整合了检测目标的分类能力——这是先前强化学习方法所不具备的功能。我们通过使用Pascal VOC 2007数据集评估这些训练后智能体的性能,表明开发出了更快且更优化的模型。值得注意的是,本研究中实现的最佳平均精度均值(mAP)为51.4,超越了文献中基于强化学习的单目标检测器所设定的基准。