Evolution Strategies (ES) have emerged as a competitive alternative for model-free reinforcement learning, showcasing exemplary performance in tasks like Mujoco and Atari. Notably, they shine in scenarios with imperfect reward functions, making them invaluable for real-world applications where dense reward signals may be elusive. Yet, an inherent assumption in ES, that all input features are task-relevant, poses challenges, especially when confronted with irrelevant features common in real-world problems. This work scrutinizes this limitation, particularly focusing on the Natural Evolution Strategies (NES) variant. We propose NESHT, a novel approach that integrates Hard-Thresholding (HT) with NES to champion sparsity, ensuring only pertinent features are employed. Backed by rigorous analysis and empirical tests, NESHT demonstrates its promise in mitigating the pitfalls of irrelevant features and shines in complex decision-making problems like noisy Mujoco and Atari tasks.
翻译:进化策略(ES)已成为无模型强化学习中具有竞争力的替代方案,在MuJoCo和Atari等任务中展现出卓越性能。值得注意的是,在奖励函数不完善的场景中,ES表现尤为出色,这使其成为实际应用中难以获取密集奖励信号时的宝贵工具。然而,ES隐含的所有输入特征均与任务相关的假设,在面对现实问题中普遍存在的无关特征时会带来挑战。本研究深入探究了这一局限性,特别聚焦于自然进化策略(NES)变体。我们提出NESHT,这是一种将硬阈值法(HT)与NES创新性结合以倡导稀疏性的新方法,确保仅使用相关特征。在严谨分析与实证测试的支持下,NESHT展示了其在缓解无关特征缺陷方面的潜力,并在嘈杂MuJoCo和Atari等复杂决策任务中表现优异。