In Reinforcement Learning (RL), agents aim at maximizing cumulative rewards in a given environment. During the learning process, RL agents face the dilemma of exploitation and exploration: leveraging existing knowledge to acquire rewards or seeking potentially higher ones. Using uncertainty as a guiding principle provides an active and effective approach to solving this dilemma and ensemble-based methods are one of the prominent avenues for quantifying uncertainty. Nevertheless, conventional ensemble-based uncertainty estimation lacks an explicit prior, deviating from Bayesian principles. Besides, this method requires diversity among members to generate less biased uncertainty estimation results. To address the above problems, previous research has incorporated random functions as priors. Building upon these foundational efforts, our work introduces an innovative approach with delicately designed prior NNs, which can incorporate maximal diversity in the initial value functions of RL. Our method has demonstrated superior performance compared with the random prior approaches in solving classic control problems and general exploration tasks, significantly improving sample efficiency.
翻译:在强化学习中,智能体旨在给定环境中最大化累积奖励。学习过程中,智能体面临利用与探索的困境:利用已有知识获取奖励,还是寻求潜在更高奖励。以不确定性为指导原则为破解这一困境提供了主动且有效的方法,而基于集成的方法是不确定性量化的主要途径之一。然而,传统基于集成的不确定性估计缺乏显式先验,偏离贝叶斯原则。此外,该方法需要成员间的多样性以生成偏差较小的不确定性估计结果。为解决上述问题,先前研究已将随机函数作为先验。在这些基础工作的基础上,我们提出了一种创新方法,通过精心设计的先验神经网络,能够在强化学习的初始值函数中融入最大多样性。与随机先验方法相比,我们的方法在解决经典控制问题和一般探索任务中展现出更优性能,显著提升了样本效率。