Closed-loop performance of sequential decision making algorithms, such as model predictive control, depends strongly on the parameters of cost functions, models, and constraints. Bayesian optimization is a common approach to learning these parameters based on closed-loop experiments. However, traditional Bayesian optimization approaches treat the learning problem as a black box, ignoring valuable information and knowledge about the structure of the underlying problem, resulting in slow convergence and high experimental resource use. We propose a time-series-informed optimization framework that incorporates intermediate performance evaluations from early iterations of each experimental episode into the learning procedure. Additionally, probabilistic early stopping criteria are proposed to terminate unpromising experiments, significantly reducing experimental time. Simulation results show that our approach achieves baseline performance with approximately half the resources. Moreover, with the same resource budget, our approach outperforms the baseline in terms of final closed-loop performance, highlighting its efficiency in sequential decision making scenarios.
翻译:序列决策算法(如模型预测控制)的闭环性能高度依赖于成本函数、模型及约束的参数。贝叶斯优化是基于闭环实验学习这些参数的常用方法。然而,传统贝叶斯优化方法将学习问题视为黑箱,忽略了关于底层问题结构的宝贵信息和先验知识,导致收敛速度缓慢且实验资源消耗高。本文提出一种时序信息驱动的优化框架,该框架将每个实验周期早期迭代中的中间性能评估纳入学习过程。此外,我们提出了概率性早停准则以终止无前景的实验,从而显著减少实验时间。仿真结果表明,本方法仅需约一半资源即可达到基线性能。更重要的是,在相同资源预算下,本方法在最终闭环性能方面优于基线,凸显了其在序列决策场景中的高效性。