Inspired by the insights in cognitive science with respect to human memory and reasoning mechanism, a novel evolvable LLM-based (Large Language Model) agent framework is proposed as REMEMBERER. By equipping the LLM with a long-term experience memory, REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals, which excels an LLM-based agent with fixed exemplars or equipped with a transient working memory. We further introduce Reinforcement Learning with Experience Memory (RLEM) to update the memory. Thus, the whole system can learn from the experiences of both success and failure, and evolve its capability without fine-tuning the parameters of the LLM. In this way, the proposed REMEMBERER constitutes a semi-parametric RL agent. Extensive experiments are conducted on two RL task sets to evaluate the proposed framework. The average results with different initialization and training sets exceed the prior SOTA by 4% and 2% for the success rate on two task sets and demonstrate the superiority and robustness of REMEMBERER.
翻译:受认知科学中关于人类记忆与推理机制的启发,本文提出了一种名为REMEMBERER的新型可进化大语言模型(LLM)智能体框架。通过为LLM配备长期经验记忆,REMEMBERER能够利用过去回合的经验(即使针对不同任务目标),其性能优于使用固定范例或配备瞬时工作记忆的LLM智能体。我们进一步引入基于经验记忆的强化学习(RLEM)来更新记忆系统。由此,整个系统能够从成功与失败的双方面经验中学习,在不微调LLM参数的情况下实现能力进化。这种机制使REMEMBERER构成了一种半参数化强化学习智能体。在两个强化学习任务集上进行了大量实验以评估该框架。在不同初始化和训练集条件下的平均结果表明,该框架在两个任务集的成功率上分别超过先前最先进方法(SOTA)4%和2%,验证了REMEMBERER的优越性与鲁棒性。