In visual-based Reinforcement Learning (RL), agents often struggle to generalize well to environmental variations in the state space that were not observed during training. The variations can arise in both task-irrelevant features, such as background noise, and task-relevant features, such as robot configurations, that are related to the optimal decisions. To achieve generalization in both situations, agents are required to accurately understand the impact of changed features on the decisions, i.e., establishing the true associations between changed features and decisions in the policy model. However, due to the inherent correlations among features in the state space, the associations between features and decisions become entangled, making it difficult for the policy to distinguish them. To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. Concretely, SGFD consists of two core techniques: Random Fourier Functions (RFF) and the saliency map. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features, thereby achieving decorrelation in visual RL tasks. Our experimental results demonstrate that SGFD can generalize well on a wide range of test environments and significantly outperforms state-of-the-art methods in handling both task-irrelevant variations and task-relevant variations.
翻译:在基于视觉的强化学习中,智能体通常难以泛化到训练期间未观察到的状态空间环境变化。这些变化可能源于与任务无关的特征(如背景噪声),也可能源于与最优决策相关的任务相关特征(如机器人配置)。为在这两种情形下实现泛化,智能体需要准确理解变化特征对决策的影响,即在策略模型中建立变化特征与决策之间的真实关联。然而,由于状态空间中特征间固有的相关性,特征与决策之间的关联变得纠缠,导致策略难以区分。为此,我们提出显著性引导特征去相关方法(SGFD),通过样本重加权消除这些相关性。具体而言,SGFD包含两项核心技术:随机傅里叶函数(RFF)和显著性图。RFF用于估计高维图像中复杂的非线性相关性,而显著性图则用于识别变化特征。在显著性图引导下,SGFD采用样本重加权最小化与变化特征相关的估计相关性,从而在视觉强化学习任务中实现去相关。实验结果表明,SGFD能在广泛的测试环境中实现良好泛化,并在处理任务无关变化和任务相关变化方面显著优于现有最优方法。