Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which takes as input an image of a broad area, and aims to identify as many examples of a target object as possible. It does this through a limited sequence of queries, each of which verifies whether an example is present in a given region. A crucial feature of VAS is that each such query is informative about the spatial distribution of target objects beyond what is captured visually (for example, due to spatial correlation). We propose a reinforcement learning approach for VAS that leverages a collection of fully annotated search tasks as training data to learn a search policy, and combines features of the input image with a natural representation of active search state. Additionally, we propose domain adaptation techniques to improve the policy at decision time when training data is not fully reflective of the test-time distribution of VAS tasks. Through extensive experiments on several satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. Code and data will be made public.
翻译:许多问题可被视为借助航空影像进行的地理空间搜索,从检测偷猎活动到打击人口贩卖等。我们将这类问题建模为视觉主动搜索(VAS)框架,该框架以大面积区域图像为输入,旨在识别尽可能多的目标对象实例。通过有限序列的查询来实现,每次查询验证给定区域中是否存在目标实例。VAS的一个关键特性在于,每次查询都能提供超出视觉捕获范围的目标对象空间分布信息(例如,由于空间相关性)。我们提出了一种基于强化学习的VAS方法,该方法利用完全标注的搜索任务集合作为训练数据来学习搜索策略,并将输入图像特征与主动搜索状态的自然表示相结合。此外,我们提出了领域自适应技术,以便在训练数据不能完全反映VAS任务测试时分布的情况下,优化决策时的策略。通过在多个卫星图像数据集上的大量实验,我们证明所提出的方法显著优于多个强基线方法。代码和数据将公开发布。