Safe Reinforcement Learning for Real-World Engine Control

This work introduces a toolchain for applying Reinforcement Learning (RL), specifically the Deep Deterministic Policy Gradient (DDPG) algorithm, in safety-critical real-world environments. As an exemplary application, transient load control is demonstrated on a single-cylinder internal combustion engine testbench in Homogeneous Charge Compression Ignition (HCCI) mode, that offers high thermal efficiency and low emissions. However, HCCI poses challenges for traditional control methods due to its nonlinear, autoregressive, and stochastic nature. RL provides a viable solution, however, safety concerns, such as excessive pressure rise rates, must be addressed when applying to HCCI. A single unsuitable control input can severely damage the engine or cause misfiring and shut down. Additionally, operating limits are not known a priori and must be determined experimentally. To mitigate these risks, real-time safety monitoring based on the k-nearest neighbor algorithm is implemented, enabling safe interaction with the testbench. The feasibility of this approach is demonstrated as the RL agent learns a control policy through interaction with the testbench. A root mean square error of 0.1374 bar is achieved for the indicated mean effective pressure, comparable to neural network-based controllers from the literature. The toolchain's flexibility is further demonstrated by adapting the agent's policy to increase ethanol energy shares, promoting renewable fuel use while maintaining safety. This RL approach addresses the longstanding challenge of applying RL to safety-critical real-world environments. The developed toolchain, with its adaptability and safety mechanisms, paves the way for future applicability of RL in engine testbenches and other safety-critical settings.

翻译：本研究提出了一种在安全关键的现实世界环境中应用强化学习（RL）的工具链，具体采用了深度确定性策略梯度（DDPG）算法。作为一个示范性应用，该工作在均质充量压燃（HCCI）模式下的单缸内燃机试验台上演示了瞬态负荷控制。HCCI模式具有高热效率和低排放的优点，但其非线性、自回归和随机性的特点给传统控制方法带来了挑战。强化学习为此提供了一种可行的解决方案，然而，在应用于HCCI时，必须解决诸如压力上升率过高等安全问题。一个不合适的控制输入就可能严重损坏发动机，或导致失火和停机。此外，运行限制并非先验已知，必须通过实验确定。为了降低这些风险，我们实施了基于k近邻算法的实时安全监控，从而实现了与试验台的安全交互。该方法的可行性得到了验证，RL智能体通过与试验台的交互学习到了控制策略。对于指示平均有效压力，实现了0.1374 bar的均方根误差，与文献中基于神经网络的控制器性能相当。通过调整智能体的策略以增加乙醇能量份额，进一步证明了该工具链的灵活性，这有助于促进可再生燃料的使用，同时保持安全性。这种强化学习方法解决了将强化学习应用于安全关键的现实世界环境这一长期存在的挑战。所开发的工具链凭借其适应性和安全机制，为强化学习未来在发动机试验台及其他安全关键场景中的应用铺平了道路。