Water quality prediction using machine learning and neural network approaches

Water resources serve as the cornerstone of human livelihoods and economic progress, with intrinsic links to both public health and environmental well-being. The accurate prediction of water quality stands as a pivotal factor in enhancing water resource management and combating pollution. This research, employing diverse performance metrics, assesses the efficacy of five distinct models, namely, linear regression, Random Forest, XGBoost, LightGBM, and MLP neural network, in forecasting pH values within Georgia, USA. Concurrently, LightGBM attains the highest average precision among all models examined. Tree-based models underscore their supremacy in addressing regression challenges. Furthermore, the performance of MLP neural network is sensitive to feature scaling. Additionally, we expound upon and dissect the reasons behind the superior precision of the machine learning models when they are compared to the original study, which factors in time dependencies and spatial considerations. The primary objective of this endeavor is to establish a robust predictive pipeline, specifically tailored for practical applications. It caters not only to individuals well-versed in the realm of data science but also to those lacking specialization in particular application domains. In essence, we offer a fresh perspective for achieving relative precision in data science methodologies, emphasizing both prediction accuracy and interpretability.

翻译：水资源是人类生计与经济发展的基石，与公共卫生及环境福祉密切相关。准确预测水质是提升水资源管理与应对污染的关键因素。本研究采用多种性能指标，评估了线性回归、随机森林、XGBoost、LightGBM及多层感知器（MLP）神经网络五种模型在预测美国佐治亚州水体pH值中的有效性。结果表明，LightGBM在所有模型中达到了最高平均精度。基于树的模型凸显了其在回归问题中的优势，而MLP神经网络的性能对特征缩放较为敏感。此外，我们阐明并剖析了相较于考虑时间依赖性与空间因素的原始研究，本研究中机器学习模型取得更高精度的原因。本工作的主要目标是构建一个专为实际应用设计的稳健预测流程，不仅适用于数据科学领域的专业研究人员，也面向特定应用领域中的非专业人士。本质上，我们为数据科学方法中实现相对精度提供了新视角，兼顾了预测准确性与可解释性。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

面向预测数据分析的机器学习，72页pdf

专知会员服务

66+阅读 · 2021年7月18日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

【TPAMI2020】目标检测中的不平衡问题:综述论文，34页pdf

专知会员服务

55+阅读 · 2020年3月16日