The main goal of machine learning (ML) is to study and improve mathematical models which can be trained with data provided by the environment to infer the future and to make decisions without necessarily having complete knowledge of all influencing elements. In this work, we describe how ML can be a powerful tool in studying climate modeling. Tree ring growth was used as an implementation in different aspects, for example, studying the history of buildings and environment. By growing and via the time, a new layer of wood to beneath its bark by the tree. After years of growing, time series can be applied via a sequence of tree ring widths. The purpose of this paper is to use ML algorithms and Extreme Value Theory in order to analyse a set of tree ring widths data from nine trees growing in Nottinghamshire. Initially, we start by exploring the data through a variety of descriptive statistical approaches. Transforming data is important at this stage to find out any problem in modelling algorithm. We then use algorithm tuning and ensemble methods to improve the k-nearest neighbors (KNN) algorithm. A comparison between the developed method in this study ad other methods are applied. Also, extreme value of the dataset will be more investigated. The results of the analysis study show that the ML algorithms in the Random Forest method would give accurate results in the analysis of tree ring widths data from nine trees growing in Nottinghamshire with the lowest Root Mean Square Error value. Also, we notice that as the assumed ARMA model parameters increased, the probability of selecting the true model also increased. In terms of the Extreme Value Theory, the Weibull distribution would be a good choice to model tree ring data.
翻译:机器学习的主要目标是研究和改进数学模型,这类模型能够利用环境提供的数据进行训练,从而在无需完全了解所有影响因素的情况下推断未来并做出决策。本研究描述了机器学习如何成为研究气候建模的有力工具。树木年轮生长被用作不同方面的应用实例,例如研究建筑历史和环境。随着时间推移,树木在其树皮下每年形成一层新的木质层。经过多年生长,可以通过一系列树木年轮宽度构成时间序列。本文旨在利用机器学习算法和极值理论,分析生长于诺丁汉郡的九棵树木的年轮宽度数据集。首先,我们通过多种描述性统计方法对数据进行探索。在此阶段,数据转换对于发现建模算法中的任何问题至关重要。随后,我们采用算法调优和集成方法来改进k近邻算法。将本研究开发的方法与其他方法进行了比较,并对数据集的极端值进行了更深入的研究。分析结果表明,随机森林方法中的机器学习算法能够以最低的均方根误差值,准确分析诺丁汉郡九棵树木的年轮宽度数据。同时,我们注意到,随着假设的ARMA模型参数增加,选择真实模型的概率也随之提高。就极值理论而言,威布尔分布是模拟树木年轮数据的良好选择。