This paper considers the problem of data generation for MPC policy approximation. Learning an approximate MPC policy from expert demonstrations requires a large data set consisting of optimal state-action pairs, sampled across the feasible state space. Yet, the key challenge of efficiently generating the training samples has not been studied widely. Recently, a sensitivity-based data augmentation framework for MPC policy approximation was proposed, where the parametric sensitivities are exploited to cheaply generate several additional samples from a single offline MPC computation. The error due to augmenting the training data set with inexact samples was shown to increase with the size of the neighborhood around each sample used for data augmentation. Building upon this work, this letter paper presents an improved data augmentation scheme based on predictor-corrector steps that enforces a user-defined level of accuracy, and shows that the error bound of the augmented samples are independent of the size of the neighborhood used for data augmentation.
翻译:本文研究了面向模型预测控制(MPC)策略逼近的数据生成问题。从专家示教中学习近似MPC策略需要包含大量最优状态-动作对的大规模数据集,这些数据需在可行状态空间内采样。然而,如何高效生成训练样本这一关键问题尚未得到广泛研究。近期,一种基于灵敏度的MPC策略逼近数据增强框架被提出,该框架利用参数灵敏度从单次离线MPC计算中低成本生成多个额外样本。研究显示,因使用非精确样本扩充训练数据集而产生的误差,会随着数据增强所用邻域范围扩大而增大。基于此工作,本文提出一种基于预测-校正步骤的改进数据增强方案,该方法可强制执行用户自定义精度等级,并证明增强样本的误差界与数据增强所用邻域尺寸无关。