Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/\epsilon^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.
翻译:基于深度Q学习的算法已在许多决策问题中成功应用,但其理论基础尚不完善。本文研究了一种采用双层ReLU神经网络参数化的拟合Q迭代算法,并给出了该算法的样本复杂度保证。我们的方法在每次迭代中通过求解一个凸优化问题来估计Q函数,并证明该方法能够达到$\tilde{\mathcal{O}}(1/\epsilon^{2})$的样本复杂度,这是阶数最优的。这一结果适用于可数状态空间,且无需对马尔可夫决策过程做出线性或低秩结构等任何假设。