Adaptive Stochastic Weight Averaging

Ensemble models often improve generalization performances in challenging tasks. Yet, traditional techniques based on prediction averaging incur three well-known disadvantages: the computational overhead of training multiple models, increased latency, and memory requirements at test time. To address these issues, the Stochastic Weight Averaging (SWA) technique maintains a running average of model parameters from a specific epoch onward. Despite its potential benefits, maintaining a running average of parameters can hinder generalization, as an underlying running model begins to overfit. Conversely, an inadequately chosen starting point can render SWA more susceptible to underfitting compared to an underlying running model. In this work, we propose Adaptive Stochastic Weight Averaging (ASWA) technique that updates a running average of model parameters, only when generalization performance is improved on the validation dataset. Hence, ASWA can be seen as a combination of SWA with the early stopping technique, where the former accepts all updates on a parameter ensemble model and the latter rejects any update on an underlying running model. We conducted extensive experiments ranging from image classification to multi-hop reasoning over knowledge graphs. Our experiments over 11 benchmark datasets with 7 baseline models suggest that ASWA leads to a statistically better generalization across models and datasets

翻译：集成模型通常能提升在复杂任务中的泛化性能。然而，基于预测平均的传统技术存在三个众所周知的缺点：训练多个模型带来的计算开销、测试时延迟的增加以及内存需求。为解决这些问题，随机权重平均（SWA）技术从特定轮次开始维护模型参数的运行平均值。尽管具有潜在优势，但维护参数的运行平均值可能阻碍泛化，因为基础运行模型开始过拟合。相反，若起始点选择不当，与基础运行模型相比，SWA可能更容易出现欠拟合。在本研究中，我们提出自适应随机权重平均（ASWA）技术，该技术仅在验证数据集上泛化性能提升时，才更新模型参数的运行平均值。因此，ASWA可被视为SWA与早停技术的结合，前者接受参数集成模型的所有更新，而后者拒绝基础运行模型的任何更新。我们进行了从图像分类到知识图谱多跳推理的广泛实验。在11个基准数据集和7个基线模型上的实验表明，ASWA能在不同模型和数据集上实现统计意义上更优的泛化性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日