eGAD! 双重下降现象可由广义混叠分解解释 (eGAD! double descent is explained by Generalized Aliasing Decomposition)

A central problem in data science is to use potentially noisy samples of an unknown function to predict values for unseen inputs. In classical statistics, predictive error is understood as a trade-off between the bias and the variance that balances model simplicity with its ability to fit complex functions. However, over-parameterized models exhibit counterintuitive behaviors, such as "double descent" in which models of increasing complexity exhibit decreasing generalization error. Others may exhibit more complicated patterns of predictive error with multiple peaks and valleys. Neither double descent nor multiple descent phenomena are well explained by the bias-variance decomposition. We introduce a novel decomposition that we call the generalized aliasing decomposition (GAD) to explain the relationship between predictive performance and model complexity. The GAD decomposes the predictive error into three parts: 1) model insufficiency, which dominates when the number of parameters is much smaller than the number of data points, 2) data insufficiency, which dominates when the number of parameters is much greater than the number of data points, and 3) generalized aliasing, which dominates between these two extremes. We demonstrate the applicability of the GAD to diverse applications, including random feature models from machine learning, Fourier transforms from signal processing, solution methods for differential equations, and predictive formation enthalpy in materials discovery. Because key components of the GAD can be explicitly calculated from the relationship between model class and samples without seeing any data labels, it can answer questions related to experimental design and model selection before collecting data or performing experiments. We further demonstrate this approach on several examples and discuss implications for predictive modeling and data science.

翻译：数据科学中的一个核心问题是利用未知函数的潜在含噪样本来预测未见输入的值。在经典统计学中，预测误差被理解为偏差与方差之间的权衡，这种权衡平衡了模型的简洁性与拟合复杂函数的能力。然而，过参数化模型表现出反直觉的行为，例如"双重下降"现象，即随着模型复杂度增加，其泛化误差反而降低。其他模型可能表现出更复杂的预测误差模式，具有多个峰值和谷值。无论是双重下降还是多重下降现象，都无法通过偏差-方差分解得到很好的解释。我们提出了一种新颖的分解方法，称为广义混叠分解（GAD），用以解释预测性能与模型复杂度之间的关系。GAD将预测误差分解为三个部分：1）模型不足性，当参数数量远小于数据点数量时占主导地位；2）数据不足性，当参数数量远大于数据点数量时占主导地位；3）广义混叠，在这两个极端之间占主导地位。我们证明了GAD在多种应用中的适用性，包括机器学习中的随机特征模型、信号处理中的傅里叶变换、微分方程的求解方法以及材料发现中形成焓的预测。由于GAD的关键组成部分可以直接从模型类别与样本之间的关系中显式计算，而无需查看任何数据标签，因此它可以在收集数据或进行实验之前回答与实验设计和模型选择相关的问题。我们进一步通过几个示例展示了这种方法，并讨论了其对预测建模和数据科学的意义。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日