SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients

Small object detection in aerial imagery presents significant challenges in computer vision due to the minimal data inherent in small-sized objects and their propensity to be obscured by larger objects and background noise. Traditional methods using transformer-based models often face limitations stemming from the lack of specialized databases, which adversely affect their performance with objects of varying orientations and scales. This underscores the need for more adaptable, lightweight models. In response, this paper introduces two innovative approaches that significantly enhance detection and segmentation capabilities for small aerial objects. Firstly, we explore the use of the SAHI framework on the newly introduced lightweight YOLO v9 architecture, which utilizes Programmable Gradient Information (PGI) to reduce the substantial information loss typically encountered in sequential feature extraction processes. The paper employs the Vision Mamba model, which incorporates position embeddings to facilitate precise location-aware visual understanding, combined with a novel bidirectional State Space Model (SSM) for effective visual context modeling. This State Space Model adeptly harnesses the linear complexity of CNNs and the global receptive field of Transformers, making it particularly effective in remote sensing image classification. Our experimental results demonstrate substantial improvements in detection accuracy and processing efficiency, validating the applicability of these approaches for real-time small object detection across diverse aerial scenarios. This paper also discusses how these methodologies could serve as foundational models for future advancements in aerial object recognition technologies. The source code will be made accessible here.

翻译：航拍图像中的小目标检测因目标尺寸小、易被大目标及背景噪声遮挡，成为计算机视觉领域的重大挑战。传统基于Transformer的模型受限于专用数据库的缺乏，在应对不同朝向和尺度目标时性能欠佳，亟需更灵活轻量的模型。为此，本文提出两种创新方法，显著提升航拍小目标的检测与分割能力。首先，我们在新型轻量级YOLO v9架构上探索SAHI框架的应用——该架构利用可编程梯度信息（PGI）减少序列特征提取中常见的信息损失问题。其次，本文采用融合位置嵌入的Vision Mamba模型实现精准的位置感知视觉理解，并结合新型双向状态空间模型(SSM)进行高效视觉上下文建模。该状态空间模型巧妙融合了CNN的线性复杂度与Transformer的全局感受野优势，在遥感图像分类中表现尤为突出。实验结果表明，该方法在检测精度与处理效率上均有显著提升，验证了其在多样化航拍场景中实现实时小目标检测的适用性。本文同时探讨了这些方法作为航拍目标识别技术未来发展的基础模型的潜力。相关源代码将在此公开。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日