Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

We study the design of sample-efficient algorithms for reinforcement learning in the presence of rich, high-dimensional observations, formalized via the Block MDP problem. Existing algorithms suffer from either 1) computational intractability, 2) strong statistical assumptions that are not necessarily satisfied in practice, or 3) suboptimal sample complexity. We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics, a learning objective in which the aim is to predict the learner's own action from the current observation and observations in the (potentially distant) future. MusIK is simple and flexible, and can efficiently take advantage of general-purpose function approximation. Our analysis leverages several new techniques tailored to non-optimistic exploration algorithms, which we anticipate will find broader use.

翻译：我们研究在高维丰富观测环境下的强化学习样本高效算法设计问题，该问题通过分块MDP问题进行形式化建模。现有算法主要存在以下三类缺陷：1）计算不可行性；2）实际应用中难以满足的强统计假设；3）次优的样本复杂度。针对这些问题，我们提出了首个在最小统计假设下，关于目标精度达到最优样本复杂度的计算高效算法。该算法名为MusIK，它将系统性探索与基于多步逆运动学的表示学习相结合，其学习目标是通过当前观测和（可能相距较远的）未来观测来预测学习者自身动作。MusIK算法具有简单灵活的特性，能够高效利用通用函数逼近器。我们的理论分析揭示了若干针对非乐观探索算法的新技术，这些技术有望在更广泛的场景中得到应用。

相关内容

表示学习

关注 187

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

终身学习如何构建？NeurIPS2022《终身学习机》教程，70页ppt

专知会员服务

46+阅读 · 2023年1月26日

斯坦福大学最新【强化学习】2022课程，含ppt

专知会员服务

136+阅读 · 2022年2月27日

Into the Metaverse，93页ppt介绍元宇宙概念、应用、趋势

专知会员服务

49+阅读 · 2022年2月19日

【2022新书】强化学习工业应用，408页pdf

专知会员服务

232+阅读 · 2022年2月3日