Many optimal control problems require the simultaneous output of continuous and discrete control variables. Such problems are usually formulated as mixed-integer optimal control (MIOC) problems, which are challenging to solve due to the complexity of the solution space. Numerical methods such as branch-and-bound are computationally expensive and unsuitable for real-time control. This paper proposes a novel continuous-discrete reinforcement learning (CDRL) algorithm, twin delayed deep deterministic actor-Q (TD3AQ), for MIOC problems. TD3AQ combines the advantages of both actor-critic and Q-learning methods, and can handle the continuous and discrete action spaces simultaneously. The proposed algorithm is evaluated on a hybrid electric vehicle (HEV) energy management problem, where real-time control of the continuous variable engine torque and discrete variable gear ratio is essential to maximize fuel economy while satisfying driving constraints. Simulation results on different drive cycles show that TD3AQ can achieve near-optimal solutions compared to dynamic programming (DP) and outperforms the state-of-the-art discrete RL algorithm Rainbow, which is adopted for MIOC by discretizing continuous actions into a finite set of discrete values.
翻译:许多最优控制问题要求同时输出连续和离散控制变量。此类问题通常被建模为混合整数最优控制(MIOC)问题,由于解空间的复杂性而极具挑战性。分支定界等数值方法计算成本高,不适用于实时控制。本文提出了一种新颖的连续-离散强化学习(CDRL)算法——双延迟深度确定性角色Q网络(TD3AQ),用于解决MIOC问题。TD3AQ结合了角色-评论家和Q学习方法的优势,能够同时处理连续和离散动作空间。该算法在混合动力电动汽车(HEV)能量管理问题上进行了评估,其中对连续变量(发动机扭矩)和离散变量(传动比)的实时控制对于在满足驾驶约束的同时最大化燃油经济性至关重要。在不同行驶周期上的仿真结果表明,与动态规划(DP)相比,TD3AQ能够实现接近最优的解,并且在性能上优于通过将连续动作离散化为有限离散值集以用于MIOC的最先进离散强化学习算法Rainbow。