大规模激活在大型语言模型中的精细化分析 (A Refined Analysis of Massive Activations in LLMs) - 专知论文

会员服务 ·

0

Analysis · MoDELS · Attention · 有偏 · motivation ·

2025 年 3 月 28 日

A Refined Analysis of Massive Activations in LLMs

翻译：大规模激活在大型语言模型中的精细化分析

Louis Owen,Nilabhra Roy Chowdhury,Abhay Kumar,Fabian Güra

Motivated in part by their relevance for low-precision training and quantization, massive activations in large language models (LLMs) have recently emerged as a topic of interest. However, existing analyses are limited in scope, and generalizability across architectures is unclear. This paper helps address some of these gaps by conducting an analysis of massive activations across a broad range of LLMs, including both GLU-based and non-GLU-based architectures. Our findings challenge several prior assumptions, most importantly: (1) not all massive activations are detrimental, i.e. suppressing them does not lead to an explosion of perplexity or a collapse in downstream task performance; (2) proposed mitigation strategies such as Attention KV bias are model-specific and ineffective in certain cases. We consequently investigate novel hybrid mitigation strategies; in particular pairing Target Variance Rescaling (TVR) with Attention KV bias or Dynamic Tanh (DyT) successfully balances the mitigation of massive activations with preserved downstream model performance in the scenarios we investigated. Our code is available at: https://github.com/bluorion-com/refine_massive_activations.

翻译：部分受到低精度训练与量化相关性的驱动，大型语言模型（LLMs）中的大规模激活现象近期已成为研究热点。然而，现有分析在范围上存在局限，且其在不同架构间的普适性尚不明确。本文通过分析包括基于GLU与非基于GLU架构在内的广泛LLMs中的大规模激活，有助于填补部分研究空白。我们的发现挑战了先前若干假设，其中最重要的是：（1）并非所有大规模激活都是有害的，即抑制它们并不会导致困惑度爆炸或下游任务性能崩溃；（2）已提出的缓解策略（如Attention KV偏置）具有模型特异性，在某些情况下无效。因此，我们研究了新型混合缓解策略；特别地，将目标方差重缩放（TVR）与Attention KV偏置或动态Tanh（DyT）相结合，在我们研究的场景中成功实现了大规模激活缓解与下游模型性能保持之间的平衡。我们的代码发布于：https://github.com/bluorion-com/refine_massive_activations。

0

相关内容

Analysis

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

The Evolution of Multimodal Model Architectures

Arxiv

14+阅读 · 2024年5月28日

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Arxiv

12+阅读 · 2024年4月15日

When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities

Arxiv

19+阅读 · 2023年7月31日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

23+阅读 · 2021年9月23日

A Review of Graph Neural Networks and Their Applications in Power Systems

A Review of Graph Neural Networks and Their Applications in Power Systems

Arxiv

29+阅读 · 2021年1月25日

Multimodal Categorization of Crisis Events in Social Media

Multimodal Categorization of Crisis Events in Social Media

Arxiv

20+阅读 · 2020年4月10日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Arxiv

10+阅读 · 2018年8月29日

VIP会员

文章信息

相关主题

最新内容

【博士论文】已对齐 AI 系统的持续脆弱性

【博士论文】已对齐 AI 系统的持续脆弱性

专知会员服务

4+阅读 · 4月3日

潜空间综述：基础、演化、机制、能力与展望

潜空间综述：基础、演化、机制、能力与展望

专知会员服务

10+阅读 · 4月3日

《非合作空中目标识别感知方法与人工智能技术近期趋势综述》

《非合作空中目标识别感知方法与人工智能技术近期趋势综述》

专知会员服务

16+阅读 · 4月3日

《人工智能时代的国防工业政策》

《人工智能时代的国防工业政策》

专知会员服务

6+阅读 · 4月3日

来自乌克兰与伊朗冲突的经验：战场适应力仍是关键

来自乌克兰与伊朗冲突的经验：战场适应力仍是关键

专知会员服务

9+阅读 · 4月3日

《无人机前沿：印度武装力量无人机库存、全球经验与非接触式动能战争的战略要务》

《无人机前沿：印度武装力量无人机库存、全球经验与非接触式动能战争的战略要务》

专知会员服务

8+阅读 · 4月3日

《用于高功率微波反蜂群作战的生成式人工智能方法》技术报告

《用于高功率微波反蜂群作战的生成式人工智能方法》技术报告

专知会员服务

14+阅读 · 4月3日

“美国情报界年度威胁评估报告”中的技术挑战描述

“美国情报界年度威胁评估报告”中的技术挑战描述

专知会员服务

4+阅读 · 4月3日

《升级动态与核门槛政治：对伊朗-美国冲突（2024-2026）的定量-分析评估》

《升级动态与核门槛政治：对伊朗-美国冲突（2024-2026）的定量-分析评估》

专知会员服务

8+阅读 · 4月3日

《2026年美国/以色列-伊朗冲突》

《2026年美国/以色列-伊朗冲突》

专知会员服务

6+阅读 · 4月3日

《美国与伊朗的冲突》美国会服务处报告

《美国与伊朗的冲突》美国会服务处报告

专知会员服务

6+阅读 · 4月3日

美国对伊朗军事行动：弹药与反导

美国对伊朗军事行动：弹药与反导

专知会员服务

7+阅读 · 4月3日

超越技术：伊朗冲突中的“战争方式”

超越技术：伊朗冲突中的“战争方式”

专知会员服务

14+阅读 · 4月1日

军事决策大语言模型综合评价基准

军事决策大语言模型综合评价基准

专知会员服务

11+阅读 · 4月1日

利用核国家战略互动博弈（SIGNAL）进行实验性兵棋推演

利用核国家战略互动博弈（SIGNAL）进行实验性兵棋推演

专知会员服务

10+阅读 · 4月1日

相关VIP内容

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

潜空间综述：基础、演化、机制、能力与展望

《人工智能时代的国防工业政策》

【博士论文】已对齐 AI 系统的持续脆弱性

《非合作空中目标识别感知方法与人工智能技术近期趋势综述》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

相关论文

The Evolution of Multimodal Model Architectures

Arxiv

14+阅读 · 2024年5月28日

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Arxiv

12+阅读 · 2024年4月15日

When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities

Arxiv

19+阅读 · 2023年7月31日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

23+阅读 · 2021年9月23日

A Review of Graph Neural Networks and Their Applications in Power Systems

A Review of Graph Neural Networks and Their Applications in Power Systems

Arxiv

29+阅读 · 2021年1月25日

Multimodal Categorization of Crisis Events in Social Media

Multimodal Categorization of Crisis Events in Social Media

Arxiv

20+阅读 · 2020年4月10日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction

Arxiv

10+阅读 · 2018年8月29日

相关基金

城市“建成环境——空间行为”的多尺度影响关系与机理研究

国家自然科学基金

13+阅读 · 2017年12月31日

“Fishes-in-net” 酵母孢子微胶囊式近平滑假丝酵母SCRII酶有机相高效手性合成机制研究

国家自然科学基金

3+阅读 · 2016年12月31日

汉英篇章衔接对齐资源构建与分析研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

47+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

PPP项目争端谈判及其治理机制研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

“杰文斯”悖论、能效政策改进与“双控目标”分解

国家自然科学基金

0+阅读 · 2014年12月31日

海量Web用户生成内容物化关键技术

国家自然科学基金

2+阅读 · 2014年12月31日

概率抽样设计及其统计推断方法

国家自然科学基金

6+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员