Active Visual Exploration (AVE) is a task that involves dynamically selecting observations (glimpses), which is critical to facilitate comprehension and navigation within an environment. While modern AVE methods have demonstrated impressive performance, they are constrained to fixed-scale glimpses from rigid grids. In contrast, existing mobile platforms equipped with optical zoom capabilities can capture glimpses of arbitrary positions and scales. To address this gap between software and hardware capabilities, we introduce AdaGlimpse. It uses Soft Actor-Critic, a reinforcement learning algorithm tailored for exploration tasks, to select glimpses of arbitrary position and scale. This approach enables our model to rapidly establish a general awareness of the environment before zooming in for detailed analysis. Experimental results demonstrate that AdaGlimpse surpasses previous methods across various visual tasks while maintaining greater applicability in realistic AVE scenarios.
翻译:主动视觉探索(AVE)是一项涉及动态选择观测片段(即“一瞥”)的任务,这对于促进对环境的理解与导航至关重要。尽管现代AVE方法已展现出卓越的性能,但它们通常局限于从固定网格中获取固定尺度的观测片段。相比之下,现有配备光学变焦功能的移动平台能够捕获任意位置与尺度的观测片段。为弥合软件与硬件能力之间的差距,我们提出了AdaGlimpse。该方法采用专为探索任务设计的强化学习算法——软演员-评论家,来选择任意位置与尺度的观测片段。这一策略使我们的模型能够在放大进行细节分析之前,快速建立对环境的整体认知。实验结果表明,AdaGlimpse在多种视觉任务上超越了先前的方法,同时在现实AVE场景中保持了更高的适用性。