This paper provides a finite-sample analysis of a passive stochastic gradient Langevin dynamics algorithm (PSGLD) designed to achieve adaptive inverse reinforcement learning (IRL). By passive, we mean that the noisy gradients available to the PSGLD algorithm (inverse learning process) are evaluated at randomly chosen points by an external stochastic gradient algorithm (forward learner) that aims to optimize a cost function. The PSGLD algorithm acts as a randomized sampler to achieve adaptive IRL by reconstructing this cost function nonparametrically from the stationary measure of a Langevin diffusion. Previous work has analyzed the asymptotic performance of this passive algorithm using weak convergence techniques. This paper analyzes the non-asymptotic (finite-sample) performance using a logarithmic-Sobolev inequality and the Otto-Villani Theorem. We obtain finite-sample bounds on the 2-Wasserstein distance between the estimates generated by the PSGLD algorithm and the cost function. Apart from achieving finite-sample guarantees for adaptive IRL, this work extends a line of research in analysis of passive stochastic gradient algorithms to the finite-sample regime for Langevin dynamics.
翻译:本文针对一种旨在实现自适应逆强化学习的被动随机梯度朗之万动力学算法,进行了有限样本分析。所谓“被动”,是指该算法(逆向学习过程)可利用的含噪梯度,由另一旨在优化代价函数的外部随机梯度算法(正向学习器)在随机选取的点上计算得出。该算法作为随机采样器,通过从朗之万扩散的平稳测度中非参数化地重建代价函数,来实现自适应逆强化学习。先前工作已利用弱收敛技术分析了这种被动算法的渐近性能。本文则利用对数索伯列夫不等式和奥托-维拉尼定理,分析了其非渐近(有限样本)性能。我们获得了由该算法生成的估计值与代价函数之间2-瓦瑟斯坦距离的有限样本界。除为自适应逆强化学习提供了有限样本保证外,本研究还将被动随机梯度算法分析的研究方向扩展至朗之万动力学的有限样本情形。