ZODS-RS -- Zero-training Oriented Detection & Segmentation for Remote Sensing

Remote-sensing and UAV applications need models that generalize across platforms and viewpoints without task-specific training. Yet training-free pipelines often falter on oriented geometry, scale/rotation variation, and crowded ports or airfields, and rarely unify detection and segmentation. We introduce ZODS-RS, a training-free, closed-form pipeline that outputs horizontal boxes (HBB) and instance masks. Built on DINOv3 dense features and SAM-style proposals, ZODS-RS chains: PP (prototype purification via Tyler covariance), R-SEM (rotation-scale equivariant matching with separable kernels and global Hungarian assignment), and UAM (uncertainty-aware pixelwise merging with adaptive priors and optional negative prototypes). A lightweight CWLA fuses multiple DINOv3 layers. On FAIR1M (HBB) we obtain $\mathrm{mAP}_{0.50:0.95}=\mathbf{13.06}$ and $\mathrm{AP}_S=\mathbf{2.93}$ \emph{(class-averaged over ship/airplane)}; on xView (HBB) we report $\mathrm{mAP}=\mathbf{16.69}$. On our UAV dataset, ZODS-RS achieves mask $\mathrm{mIoU}=\mathbf{31.10}$ and improves small-object AP by $\mathbf{+30.70}$ over Grounded-SAM on a single 5090. This work offers a unified, \emph{no-training} solution for horizontal-box detection plus instance segmentation in aerial imagery; provides explicit closed-form formulations for PP/R-SEM/UAM tightly coupled with DINOv3; and demonstrates \emph{consistent} gains on small and crowded targets and under cross-domain shifts while keeping deployment simple.

翻译：遥感与无人机应用需要模型能够在无需任务特定训练的前提下，跨平台与视角进行泛化。然而，现有无训练流程在处理定向几何、尺度/旋转变化以及拥挤的港口或机场场景时效果欠佳，且极少能统一目标检测与分割。本文提出ZODS-RS，一种无训练、闭式的处理流程，可输出水平框与实例掩膜。ZODS-RS基于DINOv3密集特征与SAM类候选区域构建，其链式处理包括：PP（基于Tyler协方差的原型纯化）、R-SEM（基于可分离核与全局匈牙利分配的旋转尺度等变匹配）以及UAM（基于自适应先验与可选负原型的不确定性感知逐像素融合）。轻量级CWLA模块用于融合多个DINOv3层级。在FAIR1M数据集（水平框任务）上，我们获得$\mathrm{mAP}_{0.50:0.95}=\mathbf{13.06}$与$\mathrm{AP}_S=\mathbf{2.93}$（针对舰船/飞机类别的类别平均结果）；在xView数据集（水平框任务）上，报告$\mathrm{mAP}=\mathbf{16.69}$。在我们自有的无人机数据集上，ZODS-RS的掩膜$\mathrm{mIoU}=\mathbf{31.10}$，且在单块5090 GPU上，小目标AP相较于Grounded-SAM提升了$\mathbf{+30.70}$。本工作为航空影像中的水平框检测与实例分割提供了统一的、无需训练解决方案；给出了与DINOv3紧密耦合的PP/R-SEM/UAM的显式闭式表达；并在小目标、拥挤目标以及跨领域迁移场景下展现出持续的增益，同时保持了部署的简洁性。