Aerial View Goal Localization with Reinforcement Learning

Climate-induced disasters are and will continue to be on the rise, and thus search-and-rescue (SAR) operations, where the task is to localize and assist one or several people who are missing, become increasingly relevant. In many cases the rough location may be known and a UAV can be deployed to explore a given, confined area to precisely localize the missing people. Due to time and battery constraints it is often critical that localization is performed as efficiently as possible. In this work we approach this type of problem by abstracting it as an aerial view goal localization task in a framework that emulates a SAR-like setup without requiring access to actual UAVs. In this framework, an agent operates on top of an aerial image (proxy for a search area) and is tasked with localizing a goal that is described in terms of visual cues. To further mimic the situation on an actual UAV, the agent is not able to observe the search area in its entirety, not even at low resolution, and thus it has to operate solely based on partial glimpses when navigating towards the goal. To tackle this task, we propose AiRLoc, a reinforcement learning (RL)-based model that decouples exploration (searching for distant goals) and exploitation (localizing nearby goals). Extensive evaluations show that AiRLoc outperforms heuristic search methods as well as alternative learnable approaches, and that it generalizes across datasets, e.g. to disaster-hit areas without seeing a single disaster scenario during training. We also conduct a proof-of-concept study which indicates that the learnable methods outperform humans on average. Code and models have been made publicly available at https://github.com/aleksispi/airloc.

翻译：气候引发的灾难正在并将持续增加，因此搜索救援（SAR）行动——其任务是定位并协助一名或多名失踪人员——变得愈发重要。在许多情况下，大致位置可能已知，可部署无人机（UAV）探索指定受限区域以精确定位失踪人员。由于时间和电池限制，通常关键是要尽可能高效地完成定位。在这项工作中，我们通过将其抽象为一种模拟类SAR设置的框架中的航拍目标定位任务来处理此类问题，而无需实际使用无人机。在此框架中，智能体在航拍图像（作为搜索区域的代理）上操作，其任务是定位由视觉线索描述的目标。为进一步模拟实际无人机上的情况，智能体无法完整观察搜索区域，即使在低分辨率下也不行，因此它必须仅基于局部片段在向目标导航时进行操作。为解决此任务，我们提出了AiRLoc，一种基于强化学习（RL）的模型，它将探索（搜索远处目标）与利用（定位附近目标）分离。广泛评估表明，AiRLoc优于启发式搜索方法及替代的可学习方法，并在数据集间具有泛化能力，例如能应用于受灾区域而无需在训练期间见过任何灾难场景。我们还进行了一项概念验证研究，表明可学习方法平均性能优于人类。代码和模型已在https://github.com/aleksispi/airloc开源。