In recent years, different types of distributed and parallel learning schemes have received increasing attention for their strong advantages in handling large-scale data information. In the information era, to face the big data challenges {that} stem from functional data analysis very recently, we propose a novel distributed gradient descent functional learning (DGDFL) algorithm to tackle functional data across numerous local machines (processors) in the framework of reproducing kernel Hilbert space. Based on integral operator approaches, we provide the first theoretical understanding of the DGDFL algorithm in many different aspects of the literature. On the way of understanding DGDFL, firstly, a data-based gradient descent functional learning (GDFL) algorithm associated with a single-machine model is proposed and comprehensively studied. Under mild conditions, confidence-based optimal learning rates of DGDFL are obtained without the saturation boundary on the regularity index suffered in previous works in functional regression. We further provide a semi-supervised DGDFL approach to weaken the restriction on the maximal number of local machines to ensure optimal rates. To our best knowledge, the DGDFL provides the first divide-and-conquer iterative training approach to functional learning based on data samples of intrinsically infinite-dimensional random functions (functional covariates) and enriches the methodologies for functional data analysis.
翻译:近年来,各类分布式与并行学习方案因其在处理大规模数据信息方面的显著优势而日益受到关注。在信息时代,为应对近期源自函数数据分析的大数据挑战,我们提出了一种新颖的分布式梯度下降函数学习(DGDFL)算法,该算法在再生核希尔伯特空间框架下,通过多台本地机器(处理器)处理函数型数据。基于积分算子方法,我们在文献的多个不同维度首次为DGDFL算法提供了理论阐释。在研究DGDFL的过程中,我们首先提出并系统研究了与单机模型相关联的数据驱动梯度下降函数学习(GDFL)算法。在温和条件下,我们获得了DGDFL基于置信度的最优学习速率,且未出现以往函数回归研究中正则性指标存在的饱和边界问题。我们进一步提出半监督DGDFL方法,以放宽对本地机器最大数量的限制,从而确保获得最优速率。据我们所知,DGDFL首次为基于本质无限维随机函数(函数型协变量)数据样本的函数学习提供了分治迭代训练方法,并丰富了函数数据分析的方法体系。