We present a novel statistical approach to incorporating uncertainty awareness in model-free distributional reinforcement learning involving quantile regression-based deep Q networks. The proposed algorithm, $\textit{Calibrated Evidential Quantile Regression in Deep Q Networks (CEQR-DQN)}$, aims to address key challenges associated with separately estimating aleatoric and epistemic uncertainty in stochastic environments. It combines deep evidential learning with quantile calibration based on principles of conformal inference to provide explicit, sample-free computations of $\textit{global}$ uncertainty as opposed to $\textit{local}$ estimates based on simple variance, overcoming limitations of traditional methods in computational and statistical efficiency and handling of out-of-distribution (OOD) observations. Tested on a suite of miniaturized Atari games (i.e., MinAtar), CEQR-DQN is shown to surpass similar existing frameworks in scores and learning speed. Its ability to rigorously evaluate uncertainty improves exploration strategies and can serve as a blueprint for other algorithms requiring uncertainty awareness.
翻译:我们提出了一种新颖的统计方法,用于在涉及分位数回归深度Q网络的无模型分布式强化学习中纳入不确定性感知。所提出的算法——$\textit{标定证据分位数回归深度Q网络(CEQR-DQN)}$——旨在解决在随机环境中分别估计偶然不确定性和认知不确定性时所面临的关键挑战。它结合了深度证据学习与基于共形推断原理的分位数标定,提供了显式的、无样本计算的$\textit{全局}$不确定性(而非基于简单方差的$\textit{局部}$估计),克服了传统方法在计算和统计效率以及处理分布外观测值方面的局限性。在一套微型Atari游戏(即MinAtar)上进行的测试表明,CEQR-DQN在得分和学习速度上超越了现有的类似框架。其严格评估不确定性的能力改善了探索策略,并可为其他需要不确定性感知的算法提供蓝图。