This paper proposes an actor-critic algorithm for controlling the temperature of a battery pack using a cooling fluid. This is modeled by a coupled 1D partial differential equation (PDE) with a controlled advection term that determines the speed of the cooling fluid. The Hamilton-Jacobi-Bellman (HJB) equation is a PDE that evaluates the optimality of the value function and determines an optimal controller. We propose an algorithm that treats the value network as a Physics-Informed Neural Network (PINN) to solve for the continuous-time HJB equation rather than a discrete-time Bellman optimality equation, and we derive an optimal controller for the environment that we exploit to achieve optimal control. Our experiments show that a hybrid-policy method that updates the value network using the HJB equation and updates the policy network identically to PPO achieves the best results in the control of this PDE system.
翻译:本文提出了一种actor-critic算法,用于通过冷却流体控制电池组的温度。该问题由一个带受控对流项的一维偏微分方程(PDE)耦合模型描述,其中对流项决定冷却流体的速度。哈密顿-雅可比-贝尔曼(HJB)方程是一种PDE,用于评估值函数的最优性并确定最优控制器。我们提出了一种算法,将值网络视为物理信息神经网络(PINN),用于求解连续时间HJB方程(而非离散时间贝尔曼最优性方程),并推导出一种环境中的最优控制器,以实现最优控制。实验表明,采用混合策略方法——使用HJB方程更新值网络,并使用与PPO相同的策略网络更新方法——在该PDE系统的控制中取得了最佳效果。