A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm - e.g.~$\mathtt{iLQR}$ - on the learned model to minimize a target cost. This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems. We analyze an algorithm which iterates between estimating local linear models of nonlinear system dynamics and performing $\mathtt{iLQR}$-like policy updates. We demonstrate that this algorithm attains sample complexity polynomial in relevant problem parameters, and, by synthesizing locally stabilizing gains, overcomes exponential dependence in problem horizon. Experimental results validate the performance of our algorithm, and compare to natural deep-learning baselines.
翻译:基于学习的控制中常见范式是迭代估计系统动力学模型,并应用轨迹优化算法(如$\mathtt{iLQR}$)在所学模型上最小化目标代价。本文对通用非线性系统下该策略的简化变体进行了严格分析。我们分析了一种在非线性系统动力学局部线性模型估计与类$\mathtt{iLQR}$策略更新之间迭代的算法。研究表明,该算法在相关问题参数下达到多项式样本复杂度,并且通过综合局部稳定增益,克服了问题时域上的指数依赖性。实验结果验证了算法性能,并与自然深度学习基线进行了比较。