This work proposes an approach that integrates reinforcement learning and model predictive control (MPC) to efficiently solve finite-horizon optimal control problems in mixed-logical dynamical systems. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer quadratic or linear programs, which suffer from the curse of dimensionality. Our approach aims at mitigating this issue by effectively decoupling the decision on the discrete variables and the decision on the continuous variables. Moreover, to mitigate the combinatorial growth in the number of possible actions due to the prediction horizon, we conceive the definition of decoupled Q-functions to make the learning problem more tractable. The use of reinforcement learning reduces the online optimization problem of the MPC controller from a mixed-integer linear (quadratic) program to a linear (quadratic) program, greatly reducing the computational time. Simulation experiments for a microgrid, based on real-world data, demonstrate that the proposed method significantly reduces the online computation time of the MPC approach and that it generates policies with small optimality gaps and high feasibility rates.
翻译:本研究提出了一种集成强化学习与模型预测控制(MPC)的方法,用于高效求解混合逻辑动态系统中的有限时域最优控制问题。基于优化的此类系统控制涉及离散和连续决策变量,需要在线求解混合整数二次或线性规划问题,这些问题受到维度灾难的困扰。我们的方法旨在通过有效解耦离散变量决策与连续变量决策来缓解这一问题。此外,为减轻预测时域导致的可能动作数量的组合爆炸,我们设计了分离Q函数的定义,使学习问题更易处理。强化学习的运用将MPC控制器的在线优化问题从混合整数线性(二次)规划简化为线性(二次)规划,从而大幅减少了计算时间。基于真实数据的微电网仿真实验表明,所提方法显著降低了MPC方法的在线计算时间,并生成具有较小最优性差距和较高可行率的控制策略。