Conventional uncertainty-aware temporal difference (TD) learning methods often rely on simplistic assumptions, typically including a zero-mean Gaussian distribution for TD errors. Such oversimplification can lead to inaccurate error representations and compromised uncertainty estimation. In this paper, we introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning, applicable to both discrete and continuous control settings. Our framework enhances the flexibility of error distribution modeling by incorporating additional higher-order moment, particularly kurtosis, thereby improving the estimation and mitigation of data-dependent noise, i.e., aleatoric uncertainty. We examine the influence of the shape parameter of the generalized Gaussian distribution (GGD) on aleatoric uncertainty and provide a closed-form expression that demonstrates an inverse relationship between uncertainty and the shape parameter. Additionally, we propose a theoretically grounded weighting scheme to fully leverage the GGD. To address epistemic uncertainty, we enhance the batch inverse variance weighting by incorporating bias reduction and kurtosis considerations, resulting in improved robustness. Extensive experimental evaluations using policy gradient algorithms demonstrate the consistent efficacy of our method, showcasing significant performance improvements.
翻译:传统的不确定性感知时序差分(TD)学习方法通常依赖于简化的假设,特别是将TD误差假设为零均值高斯分布。这种过度简化可能导致误差表示不准确及不确定性估计受损。本文提出了一种用于深度强化学习中广义高斯误差建模的新框架,适用于离散和连续控制场景。该框架通过引入额外的高阶矩(特别是峰度)增强了误差分布建模的灵活性,从而改善了数据依赖性噪声(即偶然不确定性)的估计与缓解。我们研究了广义高斯分布(GGD)形状参数对偶然不确定性的影响,并给出了一个闭式表达式,证明不确定性与形状参数之间存在反比关系。此外,我们提出了一种基于理论依据的加权方案,以充分利用GGD。针对认知不确定性,我们通过引入偏差减少和峰度考量来增强批量逆方差加权,从而提高了鲁棒性。使用策略梯度算法进行的广泛实验评估证明了我们方法的持续有效性,并显示出显著的性能提升。