In this paper we investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant. Developing such a prediction system is a critical step on the path to optimizing and automating water treatment. Before that, there are many questions to answer about the predictability of the data, suitable neural network architectures, how to overcome partial observability and more. We first describe this dataset, and highlight challenges with seasonality, nonstationarity, partial observability, and heterogeneity across sensors and operation modes of the plant. We then describe General Value Function (GVF) predictions -- discounted cumulative sums of observations -- and highlight why they might be preferable to classical n-step predictions common in time series prediction. We discuss how to use offline data to appropriately pre-train our temporal difference learning (TD) agents that learn these GVF predictions, including how to select hyperparameters for online fine-tuning in deployment. We find that the TD-prediction agent obtains an overall lower normalized mean-squared error than the n-step prediction agent. Finally, we show the importance of learning in deployment, by comparing a TD agent trained purely offline with no online updating to a TD agent that learns online. This final result is one of the first to motivate the importance of adapting predictions in real-time, for non-stationary high-volume systems in the real world.
翻译:本文研究了基于强化学习的预测方法在真实饮用水处理厂中的应用。开发此类预测系统是实现水处理过程优化与自动化的关键步骤。在此之前,需要解决数据可预测性、合适的神经网络架构、部分可观测性克服方法等一系列问题。我们首先描述该数据集,并重点分析季节性、非平稳性、部分可观测性以及传感器与工厂运行模式的异质性带来的挑战。随后介绍一般值函数预测——即观测值的折扣累积和——并阐述其相较于时间序列预测中常见的经典n步预测的优势。我们讨论了如何利用离线数据对学习这些GVF预测的时间差分学习智能体进行预训练,包括如何在部署阶段为在线微调选择超参数。实验表明,TD预测智能体相较于n步预测智能体获得了更低的总体归一化均方误差。最后,通过对比纯离线训练(无在线更新)的TD智能体与在线学习TD智能体的性能,我们验证了部署阶段在线学习的重要性。这一最终结果率先论证了对于现实中非平稳高吞吐量系统而言,实时调整预测的重要性。