The recently emerging multi-mode plug-in hybrid electric vehicle (PHEV) technology is one of the pathways making contributions to decarbonization, and its energy management requires multiple-input and multiple-output (MIMO) control. At the present, the existing methods usually decouple the MIMO control into single-output (MISO) control and can only achieve its local optimal performance. To optimize the multi-mode vehicle globally, this paper studies a MIMO control method for energy management of the multi-mode PHEV based on multi-agent deep reinforcement learning (MADRL). By introducing a relevance ratio, a hand-shaking strategy is proposed to enable two learning agents to work collaboratively under the MADRL framework using the deep deterministic policy gradient (DDPG) algorithm. Unified settings for the DDPG agents are obtained through a sensitivity analysis of the influencing factors to the learning performance. The optimal working mode for the hand-shaking strategy is attained through a parametric study on the relevance ratio. The advantage of the proposed energy management method is demonstrated on a software-in-the-loop testing platform. The result of the study indiates that learning rate of the DDPG agents is the greatest factor in learning performance. Using the unified DDPG settings and a relevance ratio of 0.2, the proposed MADRL method can save up to 4% energy compared to the single-agent method.
翻译:近年来新兴的多模式插电式混合动力汽车(PHEV)技术是实现脱碳的途径之一,其能量管理需要多输入多输出(MIMO)控制。当前,现有方法通常将MIMO控制解耦为多输入单输出(MISO)控制,仅能实现局部最优性能。为全局优化多模式车辆,本文研究了一种基于多智能体深度强化学习(MADRL)的多模式PHEV能量管理MIMO控制方法。通过引入关联比,提出一种握手策略,使两个学习智能体在采用深度确定性策略梯度(DDPG)算法的MADRL框架下协同工作。通过对影响学习性能的因素进行敏感性分析,获得DDPG智能体的统一设置。通过关联比的参数化研究,确定握手策略的最优工作模式。在软件在环测试平台上验证了所提能量管理方法的优势。研究结果表明,DDPG智能体的学习率是影响学习性能的最大因素。采用统一的DDPG设置及0.2的关联比,所提MADRL方法相较于单智能体方法可节省高达4%的能量。