Exploration remains a significant challenge in reinforcement learning, especially in environments where extrinsic rewards are sparse or non-existent. The recent rise of foundation models, such as CLIP, offers an opportunity to leverage pretrained, semantically rich embeddings that encapsulate broad and reusable knowledge. In this work we explore the potential of these foundation models not just to drive exploration, but also to analyze the critical role of the episodic novelty term in enhancing exploration effectiveness of the agent. We also investigate whether providing the intrinsic module with complete state information -- rather than just partial observations -- can improve exploration, despite the difficulties in handling small variations within large state spaces. Our experiments in the MiniGrid domain reveal that intrinsic modules can effectively utilize full state information, significantly increasing sample efficiency while learning an optimal policy. Moreover, we show that the embeddings provided by foundation models are sometimes even better than those constructed by the agent during training, further accelerating the learning process, especially when coupled with the episodic novelty term to enhance exploration.
翻译:探索仍然是强化学习中的一个重要挑战,尤其是在外部奖励稀疏或不存在的环境中。CLIP等基础模型的兴起,为利用预训练、语义丰富的嵌入提供了机会,这些嵌入封装了广泛且可重用的知识。在这项工作中,我们不仅探讨了这些基础模型驱动探索的潜力,还分析了情景新颖性项在增强智能体探索有效性方面的关键作用。我们还研究了为内在模块提供完整状态信息(而非仅部分观测)是否能改善探索,尽管处理大型状态空间中的微小变化存在困难。我们在MiniGrid领域的实验表明,内在模块可以有效利用完整状态信息,在学习最优策略的同时显著提高样本效率。此外,我们发现基础模型提供的嵌入有时甚至优于智能体在训练期间构建的嵌入,这进一步加速了学习过程,尤其是在与情景新颖性项结合以增强探索时。