Aerial View Localization with Reinforcement Learning: Towards Emulating Search-and-Rescue

Climate-induced disasters are and will continue to be on the rise, and thus search-and-rescue (SAR) operations, where the task is to localize and assist one or several people who are missing, become increasingly relevant. In many cases the rough location may be known and a UAV can be deployed to explore a given, confined area to precisely localize the missing people. Due to time and battery constraints it is often critical that localization is performed as efficiently as possible. In this work we approach this type of problem by abstracting it as an aerial view goal localization task in a framework that emulates a SAR-like setup without requiring access to actual UAVs. In this framework, an agent operates on top of an aerial image (proxy for a search area) and is tasked with localizing a goal that is described in terms of visual cues. To further mimic the situation on an actual UAV, the agent is not able to observe the search area in its entirety, not even at low resolution, and thus it has to operate solely based on partial glimpses when navigating towards the goal. To tackle this task, we propose AiRLoc, a reinforcement learning (RL)-based model that decouples exploration (searching for distant goals) and exploitation (localizing nearby goals). Extensive evaluations show that AiRLoc outperforms heuristic search methods as well as alternative learnable approaches, and that it generalizes across datasets, e.g. to disaster-hit areas without seeing a single disaster scenario during training. We also conduct a proof-of-concept study which indicates that the learnable methods outperform humans on average. Code and models have been made publicly available at https://github.com/aleksispi/airloc.

翻译：气候引发的灾害正在并将持续增加，因此搜救（SAR）任务——即定位并协助一名或多名失踪人员的行动——变得愈发重要。在许多情况下，大致位置可能已知，无人机可被部署以探索特定受限区域，从而精确定位失踪人员。由于时间和电池限制，尽可能高效地完成定位往往至关重要。本研究通过将该类问题抽象为航拍视角目标定位任务来处理，该框架模拟了类似搜救的设置，无需使用实际无人机。在此框架中，代理在航拍图像（作为搜索区域的代理）上运行，其任务是根据视觉线索描述的目标进行定位。为进一步模拟实际无人机的情况，代理无法完整观察搜索区域，即便在低分辨率下也无法做到，因此它必须仅基于部分局部视野在导航至目标的过程中进行操作。为应对该任务，我们提出了AiRLoc——一种基于强化学习（RL）的模型，该模型将探索（搜索远处目标）与利用（定位附近目标）解耦。大量评估表明，AiRLoc优于启发式搜索方法及其他可学习方案，并且能够跨数据集泛化，例如在训练期间从未见过任何灾害场景的情况下，也能应用于受灾区域。我们还进行了一项概念验证研究，表明可学习方法平均而言优于人类。代码和模型已在 https://github.com/aleksispi/airloc 公开。