HMR-Net: Hierarchical Modular Routing for Cross-Domain Object Detection in Aerial Images

Despite advances in object detection, aerial imagery remains a challenging domain, as models often fail to generalize across variations in spatial resolution, scene composition, and semantic label coverage. Differences in geographic context, sensor characteristics, and object distributions across datasets limit the capacity of conventional models to learn consistent and transferable representations. Shared methods trained on such data tend to impose a unified representation across fundamentally different domains, resulting in poor performance on region-specific content and less flexibility when dealing with novel object categories. To address this, we propose a novel modular learning framework that enables structured specialization in aerial detection. Our method introduces a hierarchical routing mechanism with two levels of modularity: a domain routing layer that uses latent geographic embeddings to assign inputs to domain-specialized expert modules, and a scene routing mechanism that allocates image subregions to scene-specific expert modules. This allows our method to specialize across datasets and within complex scenes. Additionally, the framework contains a conditional expert module that uses external semantic information (e.g., category names or textual descriptions) to enable detection of novel object categories during inference, without the need for retraining or fine-tuning. By moving beyond monolithic representations, our method provides an adaptive framework for remote sensing object detection. Comprehensive evaluations on four datasets highlight improvements in multi-dataset generalization, region-level specialization, and open-category detection.

翻译：尽管目标检测技术取得了进展，航空图像仍是一个具有挑战性的领域，因为模型往往难以在空间分辨率、场景构成和语义标签覆盖范围的变化中实现泛化。不同数据集在地理环境、传感器特性和物体分布上的差异，限制了传统模型学习一致且可迁移的表征能力。基于此类数据训练的共享方法倾向于在不同域之间强加统一表征，导致针对区域特定内容表现不佳，且处理新物体类别时缺乏灵活性。为此，我们提出一种新型模块化学习框架，能够在航空检测中实现结构化专门化。该方法引入了一种具有两层模块化结构的层级路由机制：域路由层利用潜在地理嵌入将输入分配给域专用专家模块，场景路由机制则将图像子区域分配给场景专用专家模块。这使我们的方法能够在不同数据集及复杂场景内部实现专门化。此外，该框架包含一个条件专家模块，可利用外部语义信息（如类别名称或文本描述），在无重训练或微调的条件下实现对新物体类别的推理检测。通过突破单体式表征，我们的方法为遥感目标检测提供了自适应框架。在四个数据集上的全面评估表明，该方法在多数据集泛化、区域级专门化及开放类别检测方面均取得了显著提升。