We demonstrate equivalence between the reinforcement learning problem and the supervised classification problem. We consequently equate the exploration exploitation trade-off in reinforcement learning to the dataset imbalance problem in supervised classification, and find similarities in how they are addressed. From our analysis of the aforementioned problems we derive a novel loss function for reinforcement learning and supervised classification. Scope Loss, our new loss function, adjusts gradients to prevent performance losses from over-exploitation and dataset imbalances, without the need for any tuning. We test Scope Loss against SOTA loss functions over a basket of benchmark reinforcement learning tasks and a skewed classification dataset, and show that Scope Loss outperforms other loss functions.
翻译:我们证明了强化学习问题与监督分类问题之间的等价性。进而将强化学习中的探索-利用权衡等同于监督分类中的数据集不平衡问题,并发现两者在解决方法上具有相似性。基于对上述问题的分析,我们推导出一种适用于强化学习和监督分类的新型损失函数。我们提出的Scope Loss通过调整梯度来防止因过度利用和数据集不平衡导致的性能损失,且无需任何调参。我们在基准强化学习任务集合及偏斜分类数据集上将Scope Loss与当前最优损失函数进行对比,结果表明Scope Loss优于其他损失函数。