Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems, built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration, which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these heuristics with the intelligence and internalized human notions of interestingness captured by giant foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g. discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting and previously impossible opportunity to recognize and capitalize on serendipitous discoveries that cannot be predicted ahead of time. We evaluate IGE on a range of language-based tasks that require search and exploration. In Game of 24, a multistep mathematical reasoning problem, IGE reaches 100% success rate 70.8% faster than the best classic graph search baseline. Next, in BabyAI-Text, a challenging partially observable gridworld, IGE exceeds the previous SOTA with orders of magnitude fewer online samples. Finally, in TextWorld, we show the unique ability of IGE to succeed in settings requiring long-horizon exploration where prior SOTA FM agents like Reflexion completely fail. Overall, IGE combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.

翻译：探索算法是一类强大的算法族，旨在解决困难探索问题，其核心原理在于归档已发现的状态，并迭代地回溯至最具潜力的状态并从中继续探索。该方法已在包括雅达利游戏和机器人控制在内的多种挑战性问题中实现了超越人类的表现，但需要人工设计启发式规则来引导探索，这一过程通常耗时且难以普遍适用。为解决此问题，我们提出智能探索算法，该算法通过用巨型基础模型所具备的智能及其内化的人类兴趣认知替代这些启发式规则，极大地拓展了原始探索算法的适用范围。这使得智能探索算法具备类人的本能能力，能够直观判断任何新状态的有趣程度或潜力（例如发现新物体、位置或行为），即使在难以定义启发式规则的复杂环境中也是如此。此外，智能探索算法提供了激动人心且前所未有的机遇，能够识别并利用那些无法预先预测的意外发现。我们在多种需要搜索与探索的基于语言的任务上评估了智能探索算法的性能。在多步数学推理问题“24点游戏”中，智能探索算法以比最佳经典图搜索基线快70.8%的速度达到100%成功率。在具有挑战性的部分可观测网格世界BabyAI-Text中，智能探索算法以数量级更少的在线样本超越了先前的最优性能。最后在TextWorld环境中，我们展示了智能探索算法在需要长程探索的场景中取得成功的独特能力，而像Reflexion这样的先前最优基础模型智能体在此类场景中完全失败。总体而言，智能探索算法融合了基础模型的强大能力与探索算法的优势，为创建具有卓越探索能力的通用智能体开辟了新的研究前沿。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日