This work introduces a toolchain for applying Reinforcement Learning (RL), specifically the Deep Deterministic Policy Gradient (DDPG) algorithm, in safety-critical real-world environments. As an exemplary application, transient load control is demonstrated on a single-cylinder internal combustion engine testbench in Homogeneous Charge Compression Ignition (HCCI) mode, that offers high thermal efficiency and low emissions. However, HCCI poses challenges for traditional control methods due to its nonlinear, autoregressive, and stochastic nature. RL provides a viable solution, however, safety concerns, such as excessive pressure rise rates, must be addressed when applying to HCCI. A single unsuitable control input can severely damage the engine or cause misfiring and shut down. Additionally, operating limits are not known a priori and must be determined experimentally. To mitigate these risks, real-time safety monitoring based on the k-nearest neighbor algorithm is implemented, enabling safe interaction with the testbench. The feasibility of this approach is demonstrated as the RL agent learns a control policy through interaction with the testbench. A root mean square error of 0.1374 bar is achieved for the indicated mean effective pressure, comparable to neural network-based controllers from the literature. The toolchain's flexibility is further demonstrated by adapting the agent's policy to increase ethanol energy shares, promoting renewable fuel use while maintaining safety. This RL approach addresses the longstanding challenge of applying RL to safety-critical real-world environments. The developed toolchain, with its adaptability and safety mechanisms, paves the way for future applicability of RL in engine testbenches and other safety-critical settings.
翻译:本研究提出了一种将强化学习(RL),特别是深度确定性策略梯度(DDPG)算法,应用于安全关键现实环境中的工具链。作为一个示范性应用,该工作在均质充量压燃(HCCI)模式的单缸内燃机试验台上演示了瞬态负荷控制,该模式具有高热效率和低排放的优点。然而,由于其非线性、自回归和随机性特性,HCCI对传统控制方法提出了挑战。RL提供了一种可行的解决方案,但在应用于HCCI时,必须解决诸如压力上升率过高等安全问题。一个不合适的控制输入可能严重损坏发动机或导致失火和停机。此外,运行限制并非先验已知,必须通过实验确定。为了降低这些风险,我们实施了基于k近邻算法的实时安全监控,从而能够与试验台进行安全交互。该方法的可行性通过RL智能体在与试验台交互中学习控制策略得到验证。指示平均有效压力的均方根误差达到0.1374 bar,与文献中基于神经网络的控制器性能相当。通过调整智能体的策略以增加乙醇能量份额,进一步展示了该工具链的灵活性,从而在保持安全性的同时促进了可再生燃料的使用。这种RL方法解决了将RL应用于安全关键现实环境这一长期存在的挑战。所开发的工具链凭借其适应性和安全机制,为未来RL在发动机试验台及其他安全关键场景中的应用铺平了道路。