A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning

Distributional Reinforcement Learning (RL) estimates return distribution mainly by learning quantile values via minimizing the quantile Huber loss function, entailing a threshold parameter often selected heuristically or via hyperparameter search, which may not generalize well and can be suboptimal. This paper introduces a generalized quantile Huber loss function derived from Wasserstein distance (WD) calculation between Gaussian distributions, capturing noise in predicted (current) and target (Bellman-updated) quantile values. Compared to the classical quantile Huber loss, this innovative loss function enhances robustness against outliers. Notably, the classical Huber loss function can be seen as an approximation of our proposed loss, enabling parameter adjustment by approximating the amount of noise in the data during the learning process. Empirical tests on Atari games, a common application in distributional RL, and a recent hedging strategy using distributional RL, validate the effectiveness of our proposed loss function and its potential for parameter adjustments in distributional RL.

翻译：分布强化学习主要通过最小化分位数Huber损失函数来学习分位数值，从而估计回报分布。该函数含有一个阈值参数，通常通过启发式方法或超参数搜索确定，这可能导致泛化能力不足且结果非最优。本文提出一种基于高斯分布间Wasserstein距离计算得到的广义分位数Huber损失函数，该函数能捕获预测（当前）分位数值与目标（贝尔曼更新后）分位数值中的噪声。相较于经典的分位数Huber损失，这一创新型损失函数增强了对抗异常值的鲁棒性。值得注意的是，经典Huber损失函数可被视为我们提出的损失函数的近似形式，从而允许在学习过程中通过近似数据中的噪声量来进行参数调整。在分布强化学习常用基准Atari游戏以及近期基于分布强化学习的对冲策略上的实证测试，验证了我们提出的损失函数的有效性及其在分布强化学习中参数调整的潜力。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日