Numerous lines of aim to control $\textit{model disagreement}$ -- the extent to which two machine learning models disagree in their predictions. We adopt a simple and standard notion of model disagreement in real-valued prediction problems, namely the expected squared difference in predictions between two models trained on independent samples, without any coordination of the training processes. We would like to be able to drive disagreement to zero with some natural parameter(s) of the training procedure using analyses that can be applied to existing training methodologies. We develop a simple general technique for proving bounds on independent model disagreement based on $\textit{anchoring}$ to the average of two models within the analysis. We then apply this technique to prove disagreement bounds for four commonly used machine learning algorithms: (1) stacked aggregation over an arbitrary model class (where disagreement is driven to 0 with the number of models $k$ being stacked) (2) gradient boosting (where disagreement is driven to 0 with the number of iterations $k$) (3) neural network training with architecture search (where disagreement is driven to 0 with the size $n$ of the architecture being optimized over) and (4) regression tree training over all regression trees of fixed depth (where disagreement is driven to 0 with the depth $d$ of the tree architecture). For clarity, we work out our initial bounds in the setting of one-dimensional regression with squared error loss -- but then show that all of our results generalize to multi-dimensional regression with any strongly convex loss.
翻译:众多研究方向致力于控制$\textit{模型不一致性}$——即两个机器学习模型在预测结果上存在差异的程度。针对实值预测问题,我们采用一种简单而标准的模型不一致性定义:两个在独立样本上训练(训练过程无任何协调)的模型预测值之间期望平方差。我们期望能够通过可应用于现有训练方法的分析,利用训练过程的某些自然参数将不一致性趋近于零。我们提出了一种基于$\textit{锚定}$分析技术的通用方法,通过将两个模型的均值作为锚定点来证明独立模型不一致性的边界。随后应用该技术为四种常用机器学习算法证明了不一致性边界:(1) 基于任意模型类的堆叠聚合(不一致性随堆叠模型数量$k$趋近于0)(2) 梯度提升(不一致性随迭代次数$k$趋近于0)(3) 结合架构搜索的神经网络训练(不一致性随优化架构规模$n$趋近于0)(4) 固定深度回归树的训练(不一致性随树架构深度$d$趋近于0)。为清晰起见,我们首先在平方误差损失的一维回归场景中推导初始边界,继而证明所有结果均可推广至任意强凸损失函数的多维回归问题。