Model-based methods have significantly contributed to distinguishing task-irrelevant distractors for visual control. However, prior research has primarily focused on heterogeneous distractors like noisy background videos, leaving homogeneous distractors that closely resemble controllable agents largely unexplored, which poses significant challenges to existing methods. To tackle this problem, we propose Implicit Action Generator (IAG) to learn the implicit actions of visual distractors, and present a new algorithm named implicit Action-informed Diverse visual Distractors Distinguisher (AD3), that leverages the action inferred by IAG to train separated world models. Implicit actions effectively capture the behavior of background distractors, aiding in distinguishing the task-irrelevant components, and the agent can optimize the policy within the task-relevant state space. Our method achieves superior performance on various visual control tasks featuring both heterogeneous and homogeneous distractors. The indispensable role of implicit actions learned by IAG is also empirically validated.
翻译:基于模型的方法在区分视觉控制中任务无关干扰物方面取得了显著进展。然而,先前研究主要关注嘈杂背景视频等异质性干扰物,对于与可控智能体高度相似的均质性干扰物尚未充分探索,这对现有方法构成了重大挑战。为解决此问题,我们提出隐式动作生成器(IAG)来学习视觉干扰物的隐式动作,并推出名为隐式动作引导的多样化视觉干扰物区分器(AD3)的新算法,该算法利用IAG推断的动作训练分离的世界模型。隐式动作能有效捕捉背景干扰物的行为模式,有助于区分任务无关成分,使智能体能够在任务相关状态空间中优化策略。我们的方法在包含异质性和均质性干扰物的多种视觉控制任务上均取得了卓越性能,且IAG学习到的隐式动作的关键作用已通过实证验证。