This paper considers a single-trajectory system identification problem for linear systems under general nonlinear and/or time-varying policies with i.i.d. random excitation noises. The problem is motivated by safe learning-based control for constrained linear systems, where the safe policies during the learning process are usually nonlinear and time-varying for satisfying the state and input constraints. In this paper, we provide a non-asymptotic error bound for least square estimation when the data trajectory is generated by any nonlinear and/or time-varying policies as long as the generated state and action trajectories are bounded. This significantly generalizes the existing non-asymptotic guarantees for linear system identification, which usually consider i.i.d. random inputs or linear policies. Interestingly, our error bound is consistent with that for linear policies with respect to the dependence on the trajectory length, system dimensions, and excitation levels. Lastly, we demonstrate the applications of our results by safe learning with robust model predictive control and provide numerical analysis.
翻译:本文研究了在独立同分布随机激励噪声下,针对采用一般非线性及/或时变策略的线性系统的单轨迹系统辨识问题。该问题的研究动机源于受约束线性系统的安全学习控制——在安全学习过程中,为满足状态与输入约束,所采用的安全策略通常呈现非线性及时变特性。本文证明:当由任意非线性及/或时变策略生成的状态与动作轨迹有界时,最小二乘估计具有非渐近误差界。这一结果显著推广了现有线性系统辨识的非渐近保证(通常仅考虑独立同分布随机输入或线性策略)。值得注意的是,本研究的误差界在线性策略的轨迹长度、系统维度及激励水平依赖性方面与其保持一致性。最后,通过结合鲁棒模型预测控制的安全学习实例,展示了研究成果的应用价值,并给出了数值分析结果。