Remote-sensing and UAV applications need models that generalize across platforms and viewpoints without task-specific training. Yet training-free pipelines often falter on oriented geometry, scale/rotation variation, and crowded ports or airfields, and rarely unify detection and segmentation. We introduce ZODS-RS, a training-free, closed-form pipeline that outputs horizontal boxes (HBB) and instance masks. Built on DINOv3 dense features and SAM-style proposals, ZODS-RS chains: PP (prototype purification via Tyler covariance), R-SEM (rotation-scale equivariant matching with separable kernels and global Hungarian assignment), and UAM (uncertainty-aware pixelwise merging with adaptive priors and optional negative prototypes). A lightweight CWLA fuses multiple DINOv3 layers. On FAIR1M (HBB) we obtain $\mathrm{mAP}_{0.50:0.95}=\mathbf{13.06}$ and $\mathrm{AP}_S=\mathbf{2.93}$ \emph{(class-averaged over ship/airplane)}; on xView (HBB) we report $\mathrm{mAP}=\mathbf{16.69}$. On our UAV dataset, ZODS-RS achieves mask $\mathrm{mIoU}=\mathbf{31.10}$ and improves small-object AP by $\mathbf{+30.70}$ over Grounded-SAM on a single 5090. This work offers a unified, \emph{no-training} solution for horizontal-box detection plus instance segmentation in aerial imagery; provides explicit closed-form formulations for PP/R-SEM/UAM tightly coupled with DINOv3; and demonstrates \emph{consistent} gains on small and crowded targets and under cross-domain shifts while keeping deployment simple.
翻译:遥感与无人机应用需要模型能够在无需任务特定训练的前提下,跨平台与视角进行泛化。然而,现有无训练流程在处理定向几何、尺度/旋转变化以及拥挤的港口或机场场景时效果欠佳,且极少能统一目标检测与分割。本文提出ZODS-RS,一种无训练、闭式的处理流程,可输出水平框与实例掩膜。ZODS-RS基于DINOv3密集特征与SAM类候选区域构建,其链式处理包括:PP(基于Tyler协方差的原型纯化)、R-SEM(基于可分离核与全局匈牙利分配的旋转尺度等变匹配)以及UAM(基于自适应先验与可选负原型的不确定性感知逐像素融合)。轻量级CWLA模块用于融合多个DINOv3层级。在FAIR1M数据集(水平框任务)上,我们获得$\mathrm{mAP}_{0.50:0.95}=\mathbf{13.06}$与$\mathrm{AP}_S=\mathbf{2.93}$(针对舰船/飞机类别的类别平均结果);在xView数据集(水平框任务)上,报告$\mathrm{mAP}=\mathbf{16.69}$。在我们自有的无人机数据集上,ZODS-RS的掩膜$\mathrm{mIoU}=\mathbf{31.10}$,且在单块5090 GPU上,小目标AP相较于Grounded-SAM提升了$\mathbf{+30.70}$。本工作为航空影像中的水平框检测与实例分割提供了统一的、无需训练解决方案;给出了与DINOv3紧密耦合的PP/R-SEM/UAM的显式闭式表达;并在小目标、拥挤目标以及跨领域迁移场景下展现出持续的增益,同时保持了部署的简洁性。