Domain Adaptation for Offline Reinforcement Learning with Limited Samples

Offline reinforcement learning (RL) learns effective policies from a static target dataset. Despite state-of-the-art (SOTA) offline RL algorithms being promising, they highly rely on the quality of the target dataset. The performance of SOTA algorithms can degrade in scenarios with limited samples in the target dataset, which is often the case in real-world applications. To address this issue, domain adaptation that leverages auxiliary samples from related source datasets (such as simulators) can be beneficial. In this context, determining the optimal way to trade off the source and target datasets remains a critical challenge in offline RL. To the best of our knowledge, this paper proposes the first framework that theoretically and experimentally explores how the weight assigned to each dataset affects the performance of offline RL. We establish the performance bounds and convergence neighborhood of our framework, both of which depend on the selection of the weight. Furthermore, we identify the existence of an optimal weight for balancing the two datasets. All theoretical guarantees and optimal weight depend on the quality of the source dataset and the size of the target dataset. Our empirical results on the well-known Procgen Benchmark substantiate our theoretical contributions.

翻译：离线强化学习（RL）从静态目标数据集中学习有效策略。尽管最先进的（SOTA）离线RL算法前景广阔，但它们高度依赖于目标数据集的质量。在目标数据集样本有限的情况下（这在现实应用中很常见），SOTA算法的性能可能会下降。为解决此问题，利用来自相关源数据集（如模拟器）的辅助样本进行领域自适应可能是有益的。在此背景下，确定如何在源数据集和目标数据集之间进行权衡的最佳方式，仍然是离线RL中的一个关键挑战。据我们所知，本文提出了首个从理论和实验上探索分配给每个数据集的权重如何影响离线RL性能的框架。我们建立了该框架的性能边界和收敛邻域，两者都取决于权重的选择。此外，我们确定了存在一个用于平衡两个数据集的最优权重。所有理论保证和最优权重都取决于源数据集的质量和目标数据集的大小。我们在著名的Procgen基准测试上的实证结果证实了我们的理论贡献。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日