The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

While distributional reinforcement learning (DistRL) has been empirically effective, the question of when and why it is better than vanilla, non-distributional RL has remained unanswered. This paper explains the benefits of DistRL through the lens of small-loss bounds, which are instance-dependent bounds that scale with optimal achievable cost. Particularly, our bounds converge much faster than those from non-distributional approaches if the optimal cost is small. As warmup, we propose a distributional contextual bandit (DistCB) algorithm, which we show enjoys small-loss regret bounds and empirically outperforms the state-of-the-art on three real-world tasks. In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation. We prove that our algorithm enjoys novel small-loss PAC bounds in low-rank MDPs. As part of our analysis, we introduce the $\ell_1$ distributional eluder dimension which may be of independent interest. Then, in offline RL, we show that pessimistic DistRL enjoys small-loss PAC bounds that are novel to the offline setting and are more robust to bad single-policy coverage.

翻译：尽管分布强化学习（DistRL）在实践中已表现出有效性，但关于其在何种条件下优于普通非分布强化学习（vanilla RL）以及为何更优的问题仍未得到解答。本文通过小损失边界（small-loss bounds）视角解释DistRL的优势，此类边界依赖于具体实例，且随最优可达成本规模变化。特别地，当最优成本较小时，我们的边界收敛速度显著快于非分布方法。作为预热，我们提出一种分布上下文赌博机（DistCB）算法，证明其具有小损失遗憾界，并在三项实际任务中优于现有最优方法。在在线强化学习中，我们提出一种基于最大似然估计构建置信集的DistRL算法，并证明该算法在低秩马尔可夫决策过程中具有新型小损失PAC界。作为分析的一部分，我们引入$\ell_1$分布型eluder维度，该概念可能具有独立研究价值。此外，在离线强化学习中，我们证明悲观DistRL具有离线场景下首次出现的小损失PAC界，且对不良单策略覆盖更具鲁棒性。

相关内容

ENJOY

关注 1

ENJOY，一个“懂吃、会选、有格调”的美食电商平台——• 一触即享：为你精选优质餐厅定制独家菜单；• 可见可购：优质生活方式快递良品一网打尽；• 精致美丽：专业美食摄影师呈现的高清美图；• 限时优惠：覆盖全品类的专享折扣每日更新；岁月蹉跎，不如好好吃上一顿。ENJOY NOW！ENJOY 致力于解决“如何吃的更好”。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日