Offline Reinforcement Learning with Imbalanced Datasets

The prevalent use of benchmarks in current offline reinforcement learning (RL) research has led to a neglect of the imbalance of real-world dataset distributions in the development of models. The real-world offline RL dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations. In this paper, we specify properties of imbalanced datasets in offline RL, where the state coverage follows a power law distribution characterized by skewed policies. Theoretically and empirically, we show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset. Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences, effectively alleviating the challenges posed by imbalanced datasets. We evaluate our method on several tasks in the context of imbalanced datasets with varying levels of imbalance, utilizing the variant of D4RL. Empirical results demonstrate the superiority of our method over other baselines.

翻译：当前离线强化学习（RL）研究中普遍采用基准测试，导致模型开发忽视了真实世界数据集分布的不平衡性。由于探索挑战或安全考虑，真实世界的离线强化学习数据集在状态空间上通常呈现不平衡分布。本文刻画了离线强化学习中不平衡数据集的特性，其中状态覆盖遵循由偏斜策略决定的幂律分布。从理论和实证两方面，我们证明了基于分布约束的典型离线强化学习方法（如保守Q学习，CQL）在不平衡数据集上提取策略时效果有限。受自然智能启发，我们提出一种新颖的离线强化学习方法，该方法利用增强型CQL结合检索过程来回忆过去相关经验，有效缓解了不平衡数据集带来的挑战。我们采用D4RL变体，在不同不平衡程度的数据集任务上评估了所提方法。实验结果表明，该方法优于其他基线模型。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日