Deep reinforcement learning excels in numerous large-scale practical applications. However, existing performance analyses ignores the unique characteristics of continuous-time control problems, is unable to directly estimate the generalization error of the Bellman optimal loss and require a boundedness assumption. Our work focuses on continuous-time control problems and proposes a method that is applicable to all such problems where the transition function satisfies semi-group and Lipschitz properties. Under this method, we can directly analyze the \emph{a priori} generalization error of the Bellman optimal loss. The core of this method lies in two transformations of the loss function. To complete the transformation, we propose a decomposition method for the maximum operator. Additionally, this analysis method does not require a boundedness assumption. Finally, we obtain an \emph{a priori} generalization error without the curse of dimensionality.
翻译:深度强化学习在众多大规模实际应用中表现出色。然而,现有的性能分析忽略了连续时间控制问题的独特特性,无法直接估计贝尔曼最优损失的泛化误差,且依赖于有界性假设。本研究聚焦于连续时间控制问题,提出了一种适用于所有满足半群和Lipschitz性质的状态转移函数的通用方法。在此方法下,我们能够直接分析贝尔曼最优损失的先验泛化误差。该方法的核心理念在于对损失函数进行两次变换。为完成变换,我们提出了一种最大值算子分解方法。此外,该分析方法无需有界性假设。最终,我们得到了一个无维度灾难的先验泛化误差。