In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time, enhancing their practical applicability across a diverse set of scenarios.
翻译:本文提出了一种增强型AI智能体快速探索与利用方法——REX。现有AutoGPT风格技术存在固有局限性,例如决策过程严重依赖精确描述,且缺乏类似传统强化学习(RL)中试错机制的系统性方法。REX引入了额外的奖励层,并融合了类似于上置信界(UCB)分数的概念,从而提升了AI智能体性能的鲁棒性与效率。该方法能够利用日志中的离线行为数据,且无需任何模型微调即可与现有基础模型无缝集成。通过与现有方法(如思维链CoT和规划推理RAP)的对比分析,基于REX的方法展现出与这些技术相当的性能,在某些情况下甚至超越了现有成果。值得注意的是,REX方法在显著减少执行时间方面表现突出,增强了其在多样化场景中的实际应用价值。