Reinforcement Learning (RL) has shown great empirical success in various application domains. The theoretical aspects of the problem have been extensively studied over past decades, particularly under tabular and linear Markov Decision Process structures. Recently, non-linear function approximation using kernel-based prediction has gained traction. This approach is particularly interesting as it naturally extends the linear structure, and helps explain the behavior of neural-network-based models at their infinite width limit. The analytical results however do not adequately address the performance guarantees for this case. We will highlight this open problem, overview existing partial results, and discuss related challenges.
翻译:强化学习(RL)在各种应用领域已展现出卓越的实证成效。过去数十年间,该问题的理论层面得到了广泛研究,尤其是在表格型和线性马尔可夫决策过程结构下。近年来,采用基于核的预测进行非线性函数逼近的方法逐渐受到关注。这种方法尤为引人注目,因为它自然地扩展了线性结构,并有助于解释无限宽度极限下基于神经网络模型的行为。然而,现有分析结果未能充分论证此情形下的性能保证。本文将重点阐述这一开放性问题,概述已有的部分研究成果,并讨论相关的挑战。