Statistical Inference for Temporal Difference Learning with Linear Function Approximation

We investigate the statistical properties of Temporal Difference (TD) learning with Polyak-Ruppert averaging, arguably one of the most widely used algorithms in reinforcement learning, for the task of estimating the parameters of the optimal linear approximation to the value function. Assuming independent samples, we make three theoretical contributions that improve upon the current state-of-the-art results: (i) we establish refined high-dimensional Berry-Esseen bounds over the class of convex sets, achieving faster rates than the best known results, and (ii) we propose and analyze a novel, computationally efficient online plug-in estimator of the asymptotic covariance matrix; (iii) we derive sharper high probability convergence guarantees that depend explicitly on the asymptotic variance and hold under weaker conditions than those adopted in the literature. These results enable the construction of confidence regions and simultaneous confidence intervals for the linear parameters of the value function approximation, with guaranteed finite-sample coverage. We demonstrate the applicability of our theoretical findings through numerical experiments.

翻译：本文研究了带Polyak-Ruppert平均的时序差分（TD）学习的统计特性——该算法可视为强化学习领域应用最广泛的算法之一，其任务在于估计值函数最优线性逼近的参数。在独立样本假设下，我们提出了三项改进当前最先进成果的理论贡献：（i）建立了凸集类上的精细化高维Berry-Esseen界，获得了比已知最优结果更快的收敛速率；（ii）提出并分析了一种新颖的、计算高效的渐近协方差矩阵在线插件估计器；（iii）推导出更敏锐的高概率收敛保证，该保证显式依赖于渐近方差，且所需条件弱于文献中采用的条件。这些成果使得为值函数逼近的线性参数构建置信区域与同步置信区间成为可能，并确保有限样本覆盖性。我们通过数值实验验证了理论发现的实际适用性。

相关内容

函数逼近

关注 0

通常，函数逼近问题要求我们从定义明确的类中选择一个函数，该类以特定于任务的方式与目标函数紧密匹配（“近似”）。在应用数学的许多分支中，特别是在计算机科学中，都出现了函数逼近的需求。一个人可以区分两类主要的函数逼近问题：首先，对于已知的目标函数，逼近理论是数值分析的分支，它研究如何通过特定的函数类（例如，某些函数）来近似某些已知函数（例如，特殊函数）。，多项式或有理函数），这些属性通常具有理想的属性（廉价的计算，连续性，积分和极限值等）。其次，目标函数g可能是未知的；而不是显式公式，仅提供（x，g（x））形式的一组点。取决于g的域和共域的结构，可以采用几种近似g的技术。例如，如果g是对实数的运算，则可以使用插值，外推，回归分析和曲线拟合的技术。如果g的共域（范围集或目标集）是一个有限集，那么人们正在处理一个分类问题。在某种程度上，不同的问题（回归，分类，适应度近似）在统计学习理论中得到了统一的处理，在这些理论中，它们被视为监督学习问题。

【CMU博士论文】强化学习中策略评估的统计推断

专知会员服务

26+阅读 · 2024年9月15日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知会员服务

87+阅读 · 2020年8月28日

【ICML2020投稿论文-DeepMind】时序差分学习的推理与泛化，Temporal Difference Learning

专知会员服务

26+阅读 · 2020年3月16日

【阿里巴巴-达摩院】深度学习的时间序列数据增强综述，Time Series Data Augmentation for Deep Learning: A Survey

专知会员服务

134+阅读 · 2020年3月2日