Despite various breakthroughs in machine learning and data analysis techniques for improving smart operation and management of urban water infrastructures, some key limitations obstruct this progress. Among these shortcomings, the absence of freely available data due to data privacy or high costs of data gathering and the nonexistence of adequate rare or extreme events in the available data plays a crucial role. Here, Generative Adversarial Networks (GANs) can help overcome these challenges. In machine learning, generative models are a class of methods capable of learning data distribution to generate artificial data. In this study, we developed a GAN model to generate synthetic time series to balance our limited recorded time series data and improve the accuracy of a data-driven model for combined sewer flow prediction. We considered the sewer system of a small town in Germany as the test case. Precipitation and inflow to the storage tanks are used for the Data-Driven model development. The aim is to predict the flow using precipitation data and examine the impact of data augmentation using synthetic data in model performance. Results show that GAN can successfully generate synthetic time series from real data distribution, which helps more accurate peak flow prediction. However, the model without data augmentation works better for dry weather prediction. Therefore, an ensemble model is suggested to combine the advantages of both models.
翻译:尽管机器学习与数据分析技术在提升城市水基础设施智能运维方面取得了诸多突破,但一些关键限制仍阻碍着这一进程。其中,因数据隐私或数据采集成本高昂导致可公开获取数据匮乏,以及现有数据中罕见或极端事件样本不足等问题尤为突出。生成对抗网络(GAN)可有效应对这些挑战。在机器学习中,生成模型是一类能够通过学习数据分布来生成人工数据的方法。本研究构建了一个GAN模型用于生成合成时间序列,以平衡有限记录的时序数据,并提升基于数据驱动的合流制污水管网流量预测模型的精度。我们以德国某小镇的排水系统作为测试案例,采用降水数据及储水池入流数据构建数据驱动模型,旨在利用降水数据预测流量,并检验数据增强(使用合成数据)对模型性能的影响。结果表明:GAN能够从真实数据分布中成功生成合成时间序列,从而辅助更精准的峰值流量预测;但在干旱期流量预测中,未使用数据增强的模型表现更优。因此,建议采用集成模型以融合两类模型的优势。