Machine Learning Approach and Extreme Value Theory to Correlated Stochastic Time Series with Application to Tree Ring Data

The main goal of machine learning (ML) is to study and improve mathematical models which can be trained with data provided by the environment to infer the future and to make decisions without necessarily having complete knowledge of all influencing elements. In this work, we describe how ML can be a powerful tool in studying climate modeling. Tree ring growth was used as an implementation in different aspects, for example, studying the history of buildings and environment. By growing and via the time, a new layer of wood to beneath its bark by the tree. After years of growing, time series can be applied via a sequence of tree ring widths. The purpose of this paper is to use ML algorithms and Extreme Value Theory in order to analyse a set of tree ring widths data from nine trees growing in Nottinghamshire. Initially, we start by exploring the data through a variety of descriptive statistical approaches. Transforming data is important at this stage to find out any problem in modelling algorithm. We then use algorithm tuning and ensemble methods to improve the k-nearest neighbors (KNN) algorithm. A comparison between the developed method in this study ad other methods are applied. Also, extreme value of the dataset will be more investigated. The results of the analysis study show that the ML algorithms in the Random Forest method would give accurate results in the analysis of tree ring widths data from nine trees growing in Nottinghamshire with the lowest Root Mean Square Error value. Also, we notice that as the assumed ARMA model parameters increased, the probability of selecting the true model also increased. In terms of the Extreme Value Theory, the Weibull distribution would be a good choice to model tree ring data.

翻译：机器学习的主要目标是研究和改进数学模型，这类模型能够利用环境提供的数据进行训练，从而在无需完全了解所有影响因素的情况下推断未来并做出决策。本研究描述了机器学习如何成为研究气候建模的有力工具。树木年轮生长被用作不同方面的应用实例，例如研究建筑历史和环境。随着时间推移，树木在其树皮下每年形成一层新的木质层。经过多年生长，可以通过一系列树木年轮宽度构成时间序列。本文旨在利用机器学习算法和极值理论，分析生长于诺丁汉郡的九棵树木的年轮宽度数据集。首先，我们通过多种描述性统计方法对数据进行探索。在此阶段，数据转换对于发现建模算法中的任何问题至关重要。随后，我们采用算法调优和集成方法来改进k近邻算法。将本研究开发的方法与其他方法进行了比较，并对数据集的极端值进行了更深入的研究。分析结果表明，随机森林方法中的机器学习算法能够以最低的均方根误差值，准确分析诺丁汉郡九棵树木的年轮宽度数据。同时，我们注意到，随着假设的ARMA模型参数增加，选择真实模型的概率也随之提高。就极值理论而言，威布尔分布是模拟树木年轮数据的良好选择。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日