Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems, built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration, which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these heuristics with the intelligence and internalized human notions of interestingness captured by giant foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g. discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting and previously impossible opportunity to recognize and capitalize on serendipitous discoveries that cannot be predicted ahead of time. We evaluate IGE on a range of language-based tasks that require search and exploration. In Game of 24, a multistep mathematical reasoning problem, IGE reaches 100% success rate 70.8% faster than the best classic graph search baseline. Next, in BabyAI-Text, a challenging partially observable gridworld, IGE exceeds the previous SOTA with orders of magnitude fewer online samples. Finally, in TextWorld, we show the unique ability of IGE to succeed in settings requiring long-horizon exploration where prior SOTA FM agents like Reflexion completely fail. Overall, IGE combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.
翻译:Go-Explore是一系列旨在解决困难探索问题的强大算法,其构建于归档已发现状态、并迭代式地返回最有希望的状态并从此继续探索的原则之上。该方法已在包括Atari游戏和机器人控制在内的多种挑战性问题中实现了超人类性能,但需要手动设计启发式规则来引导探索,这通常耗时且不可行。为解决此问题,我们提出了智能Go-Explore(IGE),它通过用巨型基础模型(FMs)所捕获的智能及内在的人类兴趣概念替代这些启发式规则,极大地扩展了原始Go-Explore的适用范围。这赋予IGE一种类人的本能能力,能够直觉地识别任何新状态的有趣程度或潜力(例如发现新物体、位置或行为),即使在难以定义启发式规则的复杂环境中也是如此。此外,IGE提供了激动人心且此前无法实现的机会,能够识别并利用那些无法事先预测的意外发现。我们在多种需要搜索与探索的基于语言的任务上评估了IGE。在Game of 24(一个多步数学推理问题)中,IGE以比最佳经典图搜索基线快70.8%的速度达到100%成功率。接着,在具有挑战性的部分可观测网格世界BabyAI-Text中,IGE以数量级更少的在线样本超越了之前的SOTA。最后,在TextWorld中,我们展示了IGE在需要长程探索的场景中取得成功的独特能力,而先前如Reflexion等SOTA FM智能体在此类场景中完全失败。总体而言,IGE结合了FMs的巨大优势和强大的Go-Explore算法,为创建具有卓越探索能力的更通用智能体开辟了新的研究前沿。