Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which has three key inputs: (1) an image of the entire search area, which is subdivided into regions, (2) a local search function, which determines whether a previously unseen object class is present in a given region, and (3) a fixed search budget, which limits the number of times the local search function can be evaluated. The goal is to maximize the number of objects found within the search budget. We propose a reinforcement learning approach for VAS that learns a meta-search policy from a collection of fully annotated search tasks. This meta-search policy is then used to dynamically search for a novel target-object class, leveraging the outcome of any previous queries to determine where to query next. Through extensive experiments on several large-scale satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. We also propose novel domain adaptation techniques that improve the policy at decision time when there is a significant domain gap with the training data. Code is publicly available.
翻译:许多问题可被视为借助航空影像进行地理空间搜索的变体,其应用场景涵盖从盗猎行为检测到人口贩卖追踪等多个领域。我们将此类问题建模为视觉主动搜索框架,该框架包含三个关键输入要素:(1)整个搜索区域的影像(经分区处理),(2)判断特定区域内是否存在未识别目标类别的局部搜索函数,(3)限制局部搜索函数调用次数的固定搜索预算。研究目标是通过最大化搜索预算内检测到的目标数量。我们提出基于强化学习的视觉主动搜索方法,该方法从完全标注的搜索任务集合中学习元搜索策略。该元搜索策略随后可动态搜索新型目标对象类别,通过利用先前查询结果决定后续查询区域。在多个大规模卫星影像数据集上的实验表明,所提方法显著优于多种强基线模型。我们还提出了新型域适应技术,当训练数据存在显著域差异时,该技术可在决策阶段优化搜索策略。相关代码已公开。