Visual reinforcement learning aims to empower an agent to learn policies from visual observations, yet it remains vulnerable to dynamic visual perturbations, such as unpredictable shifts in corruption types. To systematically study this, we introduce the Visual Degraded Control Suite (VDCS), a benchmark extending DeepMind Control Suite with Markov-switching degradations to simulate non-stationary real-world perturbations. Experiments on VDCS reveal severe performance degradation in existing methods. We theoretically prove via information-theoretic analysis that this failure stems from reconstruction-based objectives inevitably entangling perturbation artifacts into latent representations. To mitigate this negative impact, we propose Agent-Centric Observations with Mixture-of-Experts (ACO-MoE) to robustify visual RL against perturbations. The proposed framework leverages unique agent-centric restoration experts, achieving restoration from corruptions and task-relevant foreground extraction, thereby decoupling perception from perturbation before being processed by the RL agent. Extensive experiments on VDCS show our ACO-MoE outperforms strong baselines, recovering 95.3% of clean performance under challenging Markov-switching corruptions. Moreover, it achieves SOTA results on DMControl Generalization with random-color and video-background perturbations, demonstrating a high level of robustness.
翻译:视觉强化学习旨在赋予智能体从视觉观测中学习策略的能力,但其仍易受动态视觉扰动(如不可预测的污染类型变化)影响。为系统研究该问题,我们提出视觉退化控制套件(VDCS)——一种扩展DeepMind控制套件并引入马尔可夫切换退化以模拟非平稳真实世界扰动的基准测试。VDCS实验表明,现有方法存在严重的性能退化。通过信息论分析,我们从理论上证明:这种失败源于基于重构的目标函数不可避免地将扰动伪影纠缠到潜在表示中。为缓解此负面影响,我们提出基于专家混合的智能体中心观测(ACO-MoE)以增强视觉RL对扰动的鲁棒性。该框架利用独特的智能体中心恢复专家,实现污染修复与任务相关前景提取,从而在RL智能体处理前将感知与扰动解耦。在VDCS上的大量实验表明,我们的ACO-MoE超越强基线方法,在挑战性马尔可夫切换污染下恢复95.3%的清洁性能。此外,其在随机颜色与视频背景扰动的DMControl泛化任务中达到SOTA结果,展现出高鲁棒性。