Value-function (VF) approximation is a central problem in Reinforcement Learning (RL). Classical non-parametric VF estimation suffers from the curse of dimensionality. As a result, parsimonious parametric models have been adopted to approximate VFs in high-dimensional spaces, with most efforts being focused on linear and neural-network-based approaches. Differently, this paper puts forth a a \emph{parsimonious non-parametric} approach, where we use \emph{stochastic low-rank algorithms} to estimate the VF matrix in an online and model-free fashion. Furthermore, as VFs tend to be multi-dimensional, we propose replacing the classical VF matrix representation with a tensor (multi-way array) representation and, then, use the PARAFAC decomposition to design an online model-free tensor low-rank algorithm. Different versions of the algorithms are proposed, their complexity is analyzed, and their performance is assessed numerically using standardized RL environments.
翻译:值函数(VF)近似是强化学习(RL)中的一个核心问题。经典的非参数化VF估计受限于维度灾难。因此,在高维空间中,人们采用简约参数模型来逼近VF,大多数研究集中于线性和基于神经网络的方法。与此不同,本文提出了一种简约非参数化方法,使用随机低秩算法以在线且无模型的方式估计VF矩阵。此外,由于VF往往具有多维性,我们建议用张量(多路数组)表示替代经典的VF矩阵表示,然后利用PARAFAC分解设计一种在线无模型的张量低秩算法。文中提出了不同版本的算法,分析了其复杂度,并使用标准化RL环境对其性能进行了数值评估。