Rare object detection is a fundamental task in applied geospatial machine learning, however is often challenging due to large amounts of high-resolution satellite or aerial imagery and few or no labeled positive samples to start with. This paper addresses the problem of bootstrapping such a rare object detection task assuming there is no labeled data and no spatial prior over the area of interest. We propose novel offline and online cluster-based approaches for sampling patches that are significantly more efficient, in terms of exposing positive samples to a human annotator, than random sampling. We apply our methods for identifying bomas, or small enclosures for herd animals, in the Serengeti Mara region of Kenya and Tanzania. We demonstrate a significant enhancement in detection efficiency, achieving a positive sampling rate increase from 2% (random) to 30%. This advancement enables effective machine learning mapping even with minimal labeling budgets, exemplified by an F1 score on the boma detection task of 0.51 with a budget of 300 total patches.
翻译:稀有目标检测是应用地理空间机器学习中的基本任务,但由于大量高分辨率卫星或航空影像的存在,且初始阶段缺乏或仅有极少标记的正样本,该任务通常具有挑战性。本文针对在无标记数据且无感兴趣区域空间先验的情况下自举稀有目标检测的问题展开研究。我们提出了新颖的基于离线与在线聚类的补丁采样方法,相较于随机采样,这些方法在向人工标注者暴露正样本方面显著提升了效率。我们将该方法应用于识别肯尼亚和坦桑尼亚塞伦盖蒂-马拉地区的博马(即牲畜围栏),并证明了检测效率的显著提升,正样本采样率从2%(随机采样)提高到30%。这一进展使得即便在极少的标记预算下也能实现有效的机器学习制图,例如在博马检测任务中,以总计300个补丁的预算获得了0.51的F1分数。