Even though Deep Reinforcement Learning (DRL) showed outstanding results in the fields of Robotics and Games, it is still challenging to implement it in the optimization of industrial processes like wastewater treatment. One of the challenges is the lack of a simulation environment that will represent the actual plant as accurately as possible to train DRL policies. Stochasticity and non-linearity of wastewater treatment data lead to unstable and incorrect predictions of models over long time horizons. One possible reason for the models' incorrect simulation behavior can be related to the issue of compounding error, which is the accumulation of errors throughout the simulation. The compounding error occurs because the model utilizes its predictions as inputs at each time step. The error between the actual data and the prediction accumulates as the simulation continues. We implemented two methods to improve the trained models for wastewater treatment data, which resulted in more accurate simulators: 1- Using the model's prediction data as input in the training step as a tool of correction, and 2- Change in the loss function to consider the long-term predicted shape (dynamics). The experimental results showed that implementing these methods can improve the behavior of simulators in terms of Dynamic Time Warping throughout a year up to 98% compared to the base model. These improvements demonstrate significant promise in creating simulators for biological processes that do not need pre-existing knowledge of the process but instead depend exclusively on time series data obtained from the system.
翻译:尽管深度强化学习在机器人技术和游戏领域取得了显著成果,但在废水处理等工业过程优化中实施仍面临挑战。其中一个关键难题是缺乏能够尽可能精确模拟实际工厂以训练深度强化学习策略的仿真环境。废水处理数据的随机性和非线性特性导致模型在长时间跨度上的预测不稳定、不准确。模型模拟行为不当的可能原因之一与复合误差问题相关,即模拟过程中误差的累积。当模型在每个时间步将其预测值作为输入使用时,实际数据与预测值之间的误差会随着模拟持续进行而不断累积。我们采用了两种方法来改进针对废水处理数据训练的模型,从而获得更精确的模拟器:1)在训练步骤中使用模型预测数据作为输入,作为校正工具;2)修改损失函数以考虑长期预测形态(动态特性)。实验结果表明,相较于基础模型,实施这些方法可将模拟器在一年内的动态时间弯曲性能提升高达98%。这些改进表明,该方法在构建无需预知过程先验知识、仅依赖系统采集的时间序列数据的生物过程模拟器方面具有显著潜力。