Reinforcement learning (RL) is one of the active fields in machine learning, demonstrating remarkable potential in tackling real-world challenges. Despite its promising prospects, this methodology has encountered with issues and challenges, hindering it from achieving the best performance. In particular, these approaches lack decent performance when navigating environments and solving tasks with large observation space, often resulting in sample-inefficiency and prolonged learning times. This issue, commonly referred to as the curse of dimensionality, complicates decision-making for RL agents, necessitating a careful balance between attention and decision-making. RL agents, when augmented with human or large language models' (LLMs) feedback, may exhibit resilience and adaptability, leading to enhanced performance and accelerated learning. Such feedback, conveyed through various modalities or granularities including natural language, serves as a guide for RL agents, aiding them in discerning relevant environmental cues and optimizing decision-making processes. In this survey paper, we mainly focus on problems of two-folds: firstly, we focus on humans or an LLMs assistance, investigating the ways in which these entities may collaborate with the RL agent in order to foster optimal behavior and expedite learning; secondly, we delve into the research papers dedicated to addressing the intricacies of environments characterized by large observation space.
翻译:强化学习(Reinforcement Learning, RL)是机器学习领域中的一个活跃分支,在应对现实世界挑战方面展现出巨大潜力。尽管前景广阔,该方法仍面临诸多问题与挑战,阻碍其达到最佳性能。特别是在处理具有大规模观测空间的环境与任务时,这些方法往往表现欠佳,导致样本效率低下与学习周期延长。这一问题通常被称为“维度灾难”,它使得强化学习智能体的决策过程变得复杂,需要在注意力分配与决策制定之间取得审慎平衡。当强化学习智能体融入人类或大型语言模型(Large Language Models, LLMs)的反馈时,可表现出更强的鲁棒性与适应性,从而实现性能提升与学习加速。此类反馈通过包括自然语言在内的多种模态或粒度进行传递,为强化学习智能体提供引导,帮助其辨识相关的环境线索并优化决策过程。本综述论文主要聚焦于两方面问题:首先,关注人类或大型语言模型的辅助作用,探究这些实体如何与强化学习智能体协同合作,以促进行为优化并加速学习进程;其次,深入探讨致力于解决大规模观测空间环境下复杂性的相关研究文献。