Exploration is a significant challenge in practical reinforcement learning (RL), and uncertainty-aware exploration that incorporates the quantification of epistemic and aleatory uncertainty has been recognized as an effective exploration strategy. However, capturing the combined effect of aleatory and epistemic uncertainty for decision-making is difficult. Existing works estimate aleatory and epistemic uncertainty separately and consider the composite uncertainty as an additive combination of the two. Nevertheless, the additive formulation leads to excessive risk-taking behavior, causing instability. In this paper, we propose an algorithm that clarifies the theoretical connection between aleatory and epistemic uncertainty, unifies aleatory and epistemic uncertainty estimation, and quantifies the combined effect of both uncertainties for a risk-sensitive exploration. Our method builds on a novel extension of distributional RL that estimates a parameterized return distribution whose parameters are random variables encoding epistemic uncertainty. Experimental results on tasks with exploration and risk challenges show that our method outperforms alternative approaches.
翻译:探索是实际强化学习(RL)中的一项重大挑战,而融合认知不确定性与偶然不确定性量化的不确定性感知探索已被视为一种有效的探索策略。然而,捕捉偶然不确定性与认知不确定性对决策的联合效应十分困难。现有工作分别估计偶然不确定性与认知不确定性,并将复合不确定性视为两者的加性组合。然而,加性公式会导致过度冒险行为,引发不稳定性。本文提出一种算法,阐明了偶然不确定性与认知不确定性之间的理论联系,统一了两种不确定性的估计,并量化了两种不确定性的联合效应以实现风险感知探索。我们的方法基于分布强化学习的一种新颖扩展,该扩展估计参数化回报分布,其参数为编码认知不确定性的随机变量。在具有探索与风险挑战的任务上的实验结果表明,我们的方法优于替代方案。