This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the problem of classifying a stream of unlabeled data in a distributed manner. We consider two kinds of classification tasks with limited observations in the prediction phase, namely, the statistical classification task and the single-sample classification task. For each task, we describe the distributed learning rule and analyze the probability of error accordingly. To do so, we first introduce a stronger consistent training condition that involves the margin distributions generated by the trained classifiers. Based on this condition, we derive an upper bound on the probability of error for both tasks, which depends on the statistical properties of the data and the combination policy used to combine the distributed classifiers. For the statistical classification problem, we employ the geometric social learning rule and conduct a non-asymptotic performance analysis. An exponential decay of the probability of error with respect to the number of unlabeled samples is observed in the upper bound. For the single-sample classification task, a distributed learning rule that functions as an ensemble classifier is constructed. An upper bound on the probability of error of this ensemble classifier is established.
翻译:本文研究了社交机器学习框架下的错误概率问题,该框架包含一个独立的训练阶段,随后在图结构上进行协作性决策阶段。该框架旨在以分布式方式处理未标记数据流分类问题。我们考虑预测阶段中两类观测数据有限的分类任务:统计分类任务与单样本分类任务。针对每类任务,我们描述了分布式学习规则并相应分析了错误概率。为此,我们首先引入一个更强的训练一致性条件,该条件涉及已训练分类器产生的间隔分布。基于该条件,我们推导出两类任务的错误概率上界,该上界取决于数据的统计特性以及组合分布式分类器时所采用的融合策略。针对统计分类问题,我们采用几何社交学习规则并进行非渐进性能分析,发现错误概率上界随未标记样本数量呈现指数衰减。针对单样本分类任务,我们构建了一个可等效为集成分类器的分布式学习规则,并建立了该集成分类器错误概率的上界。