How to out-perform default random forest regression: choosing hyperparameters for applications in large-sample hydrology

Predictions are a central part of water resources research. Historically, physically-based models have been preferred; however, they have largely failed at modeling hydrological processes at a catchment scale and there are some important prediction problems that cannot be modeled physically. As such, machine learning (ML) models have been seen as a valid alternative in recent years. In spite of their availability, well-optimized state-of-the-art ML strategies are not being widely used in water resources research. This is because using state-of-the-art ML models and optimizing hyperparameters requires expert mathematical and statistical knowledge. Further, some analyses require many model trainings, so sometimes even expert statisticians cannot properly optimize hyperparameters. To leverage data and use it effectively to drive scientific advances in the field, it is essential to make ML models accessible to subject matter experts by improving automated machine learning resources. ML models such as XGBoost have been recently shown to outperform random forest (RF) models which are traditionally used in water resources research. In this study, based on over 150 water-related datasets, we extensively compare XGBoost and RF. This study provides water scientists with access to quick user-friendly RF and XGBoost model optimization.

翻译：预测是水资源研究的核心。历史上，基于物理的模型曾备受青睐；然而，它们在流域尺度上模拟水文过程方面大多未能成功，且存在一些无法通过物理建模的重要预测问题。因此，机器学习模型近年来被视为有效的替代方案。尽管这些模型已可用，但经过充分优化的前沿机器学习策略并未在水资源研究中得到广泛应用。这是因为使用前沿机器学习模型并优化超参数需要专业的数学和统计学知识。此外，某些分析需要多次模型训练，有时即使专业统计学家也无法恰当优化超参数。为了有效利用数据推动该领域的科学进展，必须通过改进自动化机器学习工具，使领域专家也能便捷使用机器学习模型。XGBoost等机器学习模型近期已被证明优于水资源研究中传统使用的随机森林模型。本研究基于150余个与水文相关的数据集，对XGBoost和随机森林进行了广泛比较，为水文学者提供了快速、用户友好的随机森林与XGBoost模型优化方案。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日