The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic. Drawing inspiration from the alignment process observed in intelligent organisms, where declarative memory plays a pivotal role in summarizing past experiences, we propose a novel learning framework. The agents adeptly distill insights from past experiences, refining and updating existing notes to enhance their performance in the environment. This entire process transpires within the memory components and is implemented through natural language, so we character this framework as In-memory Learning. We also delve into the key features of benchmarks designed to evaluate the self-improvement process. Through systematic experiments, we demonstrate the effectiveness of our framework and provide insights into this problem.
翻译:探索智能体能否在不依赖人工标注数据的情况下与环境对齐,是一个引人入胜的研究课题。受生物有机体对齐过程的启发,其中声明性记忆在总结过往经验中发挥核心作用,我们提出了一种新颖的学习框架。该框架中的智能体能够巧妙地从过往经验中提炼洞见,通过优化和更新既有笔记来提升其在环境中的表现。这一过程完全在记忆组件内部完成,并通过自然语言实现,因此我们将该框架称为“内存内学习”。我们还深入探讨了用于评估自我改进过程的基准测试的关键特征。通过系统性实验,我们验证了该框架的有效性,并为该问题提供了深刻见解。