Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning

Accurate school detection is essential for supporting education initiatives, including infrastructure planning and expanding internet connectivity to underserved areas. However, many regions around the world face challenges due to outdated, incomplete, or unavailable official records. Manual mapping efforts, while valuable, are labor-intensive and lack scalability across large geographic areas. To address this, we propose a weakly supervised framework for school detection from aerial imagery that minimizes the need for human annotations while supporting global mapping efforts. Our method is specifically designed for low-data regimes, where manual annotations are extremely scarce. We introduce an automatic labeling pipeline that leverages sparse location points and semantic segmentation to generate infrastructure masks from which we generate bounding boxes. Using these automatically labeled images, we train our detectors on a first training stage to learn a representation of what schools look like, then using a small set of manually labeled images, we fine-tune the previously trained models on this clean dataset. This two stage training pipeline enables large-scale and strong detection in low-data setting of school infrastructure with minimal supervision. Our results demonstrate strong object detection performance, particularly in the low-data regime, where the models achieve promising results using only 50 manually labeled images, significantly reducing the need for costly annotations. This framework supports education and connectivity initiatives worldwide by providing an efficient and extensible approach to mapping schools from space. All models, training code and auto-labeled data will be publicly released to foster future research and real-world impact.

翻译：精确的学校检测对于支持教育计划至关重要，包括基础设施建设以及为欠发达地区扩展互联网连接。然而，全球许多地区因官方记录过时、不完整或缺失而面临挑战。人工制图虽具价值，但劳动密集且难以在大范围地理区域内扩展。为此，我们提出一种面向航空影像的弱监督学校检测框架，该框架在支持全球制图的同时最大程度减少人工标注需求。该方法专为人工标注极度匮乏的低数据场景设计。我们引入自动标注流水线，利用稀疏定位点和语义分割生成基础设施掩模，并从中提取边界框。基于这些自动标注图像，我们在第一阶段训练检测器以学习学校建筑的表征，随后利用少量人工标注图像在该干净数据集上对预训练模型进行微调。这种两阶段训练流水线可在低数据场景下以极小监督实现大规模、高强度的学校基础设施检测。实验结果表明，该方法在目标检测任务中表现优异，尤其在使用仅50张人工标注图像的低数据场景下仍能取得理想结果，显著降低了昂贵标注成本。该框架通过提供高效可扩展的卫星图像学校建筑制图方法，为全球教育互联计划提供支持。所有模型、训练代码及自动标注数据将公开发布，以促进未来研究与应用落地。