Language is often considered a key aspect of human thinking, providing us with exceptional abilities to generalize, explore, plan, replan, and adapt to new situations. However, Reinforcement Learning (RL) agents are far from human-level performance in any of these abilities. We hypothesize one reason for such cognitive deficiencies is that they lack the benefits of thinking in language and that we can improve AI agents by training them to think like humans do. We introduce a novel Imitation Learning framework, Thought Cloning, where the idea is to not just clone the behaviors of human demonstrators, but also the thoughts humans have as they perform these behaviors. While we expect Thought Cloning to truly shine at scale on internet-sized datasets of humans thinking out loud while acting (e.g. online videos with transcripts), here we conduct experiments in a domain where the thinking and action data are synthetically generated. Results reveal that Thought Cloning learns much faster than Behavioral Cloning and its performance advantage grows the further out of distribution test tasks are, highlighting its ability to better handle novel situations. Thought Cloning also provides important benefits for AI Safety and Interpretability, and makes it easier to debug and improve AI. Because we can observe the agent's thoughts, we can (1) more easily diagnose why things are going wrong, making it easier to fix the problem, (2) steer the agent by correcting its thinking, or (3) prevent it from doing unsafe things it plans to do. Overall, by training agents how to think as well as behave, Thought Cloning creates safer, more powerful agents.
翻译:语言常被视为人类思维的关键要素,赋予我们卓越的泛化、探索、规划、重新规划及适应新情境的能力。然而,强化学习(RL)智能体在这些能力上远未达到人类水平。我们假设造成此类认知缺陷的一个原因是它们缺乏语言思维的优势,而通过训练AI智能体像人类一样思考,可以改进它们。我们提出一种新颖的模仿学习框架——思维克隆(Thought Cloning),其核心理念不仅克隆人类示范者的行为,还克隆其在执行这些行为时的思维过程。尽管我们预期思维克隆在大规模互联网级别的“人类边行动边出声思考”数据集(例如带字幕的在线视频)上能真正展现优势,但本文在思维与行为数据均合成生成的领域进行实验。结果表明,思维克隆的学习速度远快于行为克隆,且其性能优势在测试任务分布偏移程度越大时越显著,凸显了其应对新情境的能力。思维克隆还为AI安全性和可解释性带来重要益处,并简化了AI的调试与改进过程。由于可观测智能体的思维,我们得以:(1)更轻松地诊断问题根源,从而简化修复流程;(2)通过纠正其思维来引导智能体;(3)阻止其执行计划中的危险行为。总体而言,通过同时训练智能体的思维与行为模式,思维克隆构建了更安全、更强大的智能体。