In recent years, different types of distributed learning schemes have received increasing attention for their strong advantages in handling large-scale data information. In the information era, to face the big data challenges which stem from functional data analysis very recently, we propose a novel distributed gradient descent functional learning (DGDFL) algorithm to tackle functional data across numerous local machines (processors) in the framework of reproducing kernel Hilbert space. Based on integral operator approaches, we provide the first theoretical understanding of the DGDFL algorithm in many different aspects in the literature. On the way of understanding DGDFL, firstly, a data-based gradient descent functional learning (GDFL) algorithm associated with a single-machine model is proposed and comprehensively studied. Under mild conditions, confidence-based optimal learning rates of DGDFL are obtained without the saturation boundary on the regularity index suffered in previous works in functional regression. We further provide a semi-supervised DGDFL approach to weaken the restriction on the maximal number of local machines to ensure optimal rates. To our best knowledge, the DGDFL provides the first distributed iterative training approach to functional learning and enriches the stage of functional data analysis.
翻译:近年来,不同类型的分布式学习方案因其在处理大规模数据信息方面的显著优势而受到日益关注。在信息时代,为了应对近期函数数据分析中涌现的大数据挑战,我们提出了一种新颖的分布式梯度下降函数学习(DGDFL)算法,用于在再生核希尔伯特空间框架下处理跨越多台本地机器(处理器)的函数数据。基于积分算子方法,我们从文献中的多个不同方面首次提供了对DGDFL算法的理论理解。在理解DGDFL的过程中,我们首先提出并全面研究了与单机模型相关的基于数据的梯度下降函数学习(GDFL)算法。在温和条件下,我们获得了DGDFL的基于置信度的最优学习率,且未受到先前函数回归工作中规则性指数饱和边界的限制。我们进一步提出了一种半监督DGDFL方法,以放宽对本地机器最大数量的限制,确保实现最优率。据我们所知,DGDFL提供了首个面向函数学习的分布式迭代训练方法,并丰富了函数数据分析的阶段。