We present a visual-context image retrieval-augmented generation (ImageRAG) assisted AI agent for automatic target recognition (ATR) of synthetic aperture radar (SAR). SAR is a remote sensing method used in defense and security applications to detect and monitor the positions of military vehicles, which may appear indistinguishable in images. Researchers have extensively studied SAR ATR to improve the differentiation and identification of vehicle types, characteristics, and measurements. Test examples can be compared with known vehicle target types to improve recognition tasks. New methods enhance the capabilities of neural networks, transformer attention, and multimodal large language models. An agentic AI method may be developed to utilize a defined set of tools, such as searching through a library of similar examples. Our proposed method, SAR Retrieval-Augmented Generation (SAR-RAG), combines a multimodal large language model (MLLM) with a vector database of semantic embeddings to support contextual search for image exemplars with known qualities. By recovering past image examples with known true target types, our SAR-RAG system can compare similar vehicle categories, achieving improved ATR prediction accuracy. We evaluate this through search and retrieval metrics, categorical classification accuracy, and numeric regression of vehicle dimensions. These metrics all show improvements when SAR-RAG is added to an MLLM baseline method as an attached ATR memory bank.
翻译:本文提出一种视觉上下文图像检索增强生成(ImageRAG)辅助的AI智能体,用于合成孔径雷达(SAR)的自动目标识别(ATR)。SAR是一种应用于国防与安全领域的遥感方法,用于探测与监控军事载具的位置,这些载具在图像中可能呈现难以区分的特征。为提升载具类型、特性与测量参数的区分与识别能力,学界已对SAR ATR展开广泛研究。通过将测试样本与已知载具目标类型进行比对,可提升识别任务的性能。新方法增强了神经网络、Transformer注意力机制与多模态大语言模型的能力。可开发一种智能体AI方法,利用既定工具集(例如在相似样本库中进行搜索)完成任务。我们提出的SAR检索增强生成(SAR-RAG)方法,将多模态大语言模型(MLLM)与语义嵌入向量数据库相结合,支持对具有已知特性的图像范例进行上下文搜索。通过检索具有已知真实目标类型的历史图像样本,SAR-RAG系统能够比对相似载具类别,从而提升ATR预测准确率。我们通过搜索检索指标、类别分类准确率以及载具尺寸的数值回归进行评估。所有指标均表明,将SAR-RAG作为附加ATR记忆库集成到MLLM基线方法后,性能均获得提升。