Successful applications of distributional reinforcement learning with quantile regression prompt a natural question: can we use other statistics to represent the distribution of returns? In particular, expectile regression is known to be more efficient than quantile regression for approximating distributions, especially on extreme values, and by providing a straightforward estimator of the mean it is a natural candidate for reinforcement learning. Prior work has answered this question positively in the case of expectiles, with the major caveat that expensive computations must be performed to ensure convergence. In this work, we propose a dual expectile-quantile approach which solves the shortcomings of previous work while leveraging the complementary properties of expectiles and quantiles. Our method outperforms both quantile-based and expectile-based baselines on the MuJoCo continuous control benchmark.
翻译:基于分位数回归的分布式强化学习的成功应用引发了一个自然问题:我们能否使用其他统计量来表示收益的分布?特别地,期望回归在近似分布方面比分位数回归更有效,尤其是在极值上,并且通过提供均值的直接估计量,它成为强化学习的自然候选方法。先前的工作已经在期望值的情况下对这个问题给出了肯定回答,但存在一个主要缺陷:必须执行昂贵的计算以确保收敛性。在本工作中,我们提出了一种双重期望-分位数方法,在利用期望值与分位数互补特性的同时,解决了先前工作的不足。我们的方法在MuJoCo连续控制基准上优于基于分位数和基于期望值的基线方法。