Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling - 专知论文

会员服务 ·

0

大型语言模型 · 语言模型 · 训练数据 · 重构模型 · 分析 ·

2023 年 4 月 3 日

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

翻译：Pythia：一套用于分析大语言模型在训练与扩展过程中的表现套件

Stella Biderman,Hailey Schoelkopf,Quentin Anthony,Herbie Bradley,Kyle O'Brien,Eric Hallahan,Mohammad Aflah Khan,Shivanshu Purohit,USVSN Sai Prashanth,Edward Raff,Aviya Skowron,Lintang Sutawika,Oskar van der Wal

from arxiv, Code at https://github.com/EleutherAI/pythia

How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.

翻译：大语言模型（LLMs）在训练过程中如何发展演化？随着模型规模的扩展，这些模式如何变化？为回答这些问题，我们提出了\textit{Pythia}——一套包含16个LLMs的套件，所有模型均在完全相同的公共数据顺序上训练，参数规模从7000万到120亿不等。我们公开了每个16个模型的154个检查点，并提供工具以下载和重建其精确的训练数据加载器以供进一步研究。我们期望\textit{Pythia}能促进多个领域的研究，并展示了若干案例研究，包括记忆化、词汇频率对少样本性能的影响以及减少性别偏见等新发现。我们证明了这种高度受控的设置可用于对LLMs及其训练动态产生新见解。训练完成的模型、分析代码、训练代码及训练数据均可访问 https://github.com/EleutherAI/pythia 获取。

0

相关内容

大型语言模型

大型语言模型

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

蛋白质语言建模？伯克利RoshanRao157页博士论文《训练，评估和理解蛋白质序列的进化模型》

蛋白质语言建模？伯克利RoshanRao157页博士论文《训练，评估和理解蛋白质序列的进化模型》

专知会员服务

26+阅读 · 2022年3月22日

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

专知会员服务

17+阅读 · 2022年3月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Uber AI新论文】持续元学习，Learning to Continually Learn

【Uber AI新论文】持续元学习，Learning to Continually Learn

专知会员服务

37+阅读 · 2020年2月27日

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

专知会员服务

113+阅读 · 2019年12月13日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

在Python中使用SpaCy进行文本分类

在Python中使用SpaCy进行文本分类

专知

24+阅读 · 2018年5月8日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

SMYD3调控Wnt/β-catenin信号通路的分子机制及其在肝细胞癌中功能的研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-5591靶向AGER/ROS/JNK抑制MSCs氧化应激损伤在糖尿病创面修复中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

DOT1介导的H3K79甲基化修饰的调节机制

国家自然科学基金

0+阅读 · 2014年12月31日

维甲酸类似物抑制PI3K/Akt信号通路逆转早幼粒白血病细胞维甲酸耐药的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

β2-AR/PKA通路在内皮祖细胞修复急性肾损伤中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

胆盐（GCDA）诱导肝癌细胞生存与耐药的信号通路研究

国家自然科学基金

0+阅读 · 2013年12月31日

空间辐射和微重力影响dauer期线虫DNA损伤修复的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Stra8及其相互作用蛋白Setd8在精子发生中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer和Drosha基因遗传变异与膀胱癌易感性及其机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Scaling Data-Constrained Language Models

Arxiv

0+阅读 · 2023年5月25日

Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR

Arxiv

0+阅读 · 2023年5月24日

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Arxiv

0+阅读 · 2023年5月24日

Adversarial Demonstration Attacks on Large Language Models

Arxiv

0+阅读 · 2023年5月24日

A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification

Arxiv

0+阅读 · 2023年5月24日

Hierarchical Prompting Assists Large Language Model on Web Navigation

Arxiv

0+阅读 · 2023年5月23日

Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

Arxiv

0+阅读 · 2023年5月23日

MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks

Arxiv

0+阅读 · 2023年5月23日

Narrative XL: A Large-scale Dataset For Long-Term Memory Models

Arxiv

0+阅读 · 2023年5月23日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

VIP会员

文章信息

相关主题

大型语言模型

最新内容

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

专知会员服务

5+阅读 · 6月16日

多模态代码智能综述：从视觉输入到可执行代码系统

多模态代码智能综述：从视觉输入到可执行代码系统

专知会员服务

5+阅读 · 6月16日

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

专知会员服务

4+阅读 · 6月16日

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

专知会员服务

4+阅读 · 6月16日

《通用大语言模型：无人机指挥与控制接口》最新40页

《通用大语言模型：无人机指挥与控制接口》最新40页

专知会员服务

15+阅读 · 6月16日

《通过小型无人机系统将情报能力“作战化”》

《通过小型无人机系统将情报能力“作战化”》

专知会员服务

5+阅读 · 6月16日

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

《神经安全型有人–无人协同：面向认知自适应作战能力的参考架构》

专知会员服务

9+阅读 · 6月16日

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

《在指挥链中通过多准则决策分析传达指挥官意图：空战实验》

专知会员服务

21+阅读 · 6月15日

消耗优势：美军的“精确规模化”概念

消耗优势：美军的“精确规模化”概念

专知会员服务

8+阅读 · 6月15日

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

五角大楼的AI优先战略及其对现代战争的启示：来自与伊朗冲突的经验教训

专知会员服务

9+阅读 · 6月15日

《网络空间兵棋推演：挑战、局限性与混合路径》报告

《网络空间兵棋推演：挑战、局限性与混合路径》报告

专知会员服务

9+阅读 · 6月15日

《离线语言支持系统：面向空战战术决策》

《离线语言支持系统：面向空战战术决策》

专知会员服务

10+阅读 · 6月15日

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

《以通信为中心的6G–LLM架构：面向可扩展的战术自主防御车辆网络》

专知会员服务

9+阅读 · 6月15日

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

ICML 2026｜ECA：面向开放式图文生成的高效持续对齐

专知会员服务

6+阅读 · 6月14日

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

可信智能体AI综述：安全、鲁棒性、隐私与系统安全

专知会员服务

6+阅读 · 6月14日

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

蛋白质语言建模？伯克利RoshanRao157页博士论文《训练，评估和理解蛋白质序列的进化模型》

蛋白质语言建模？伯克利RoshanRao157页博士论文《训练，评估和理解蛋白质序列的进化模型》

专知会员服务

26+阅读 · 2022年3月22日

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

【伯克利Roshan Rao博士论文】训练，评估和理解蛋白质序列的进化模型，Training, Evaluating, and Understanding Evolutionary Models for Protein Sequences

专知会员服务

17+阅读 · 2022年3月6日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【Uber AI新论文】持续元学习，Learning to Continually Learn

【Uber AI新论文】持续元学习，Learning to Continually Learn

专知会员服务

37+阅读 · 2020年2月27日

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

Uber AI NeurIPS 2019《元学习meta-learning》教程，附92页PPT下载

专知会员服务

113+阅读 · 2019年12月13日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

多模态代码智能综述：从视觉输入到可执行代码系统

《面向导弹有效发射时机的监督机器学习方法：基于超视距空战仿真》

ICML 2026 | VOTP：用视频基础模型与最优传输，让离线偏好强化学习只需少量反馈

美国马六甲“三重网”概念：安全网、威慑网与杀伤网

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

在Python中使用SpaCy进行文本分类

在Python中使用SpaCy进行文本分类

专知

24+阅读 · 2018年5月8日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Scaling Data-Constrained Language Models

Arxiv

0+阅读 · 2023年5月25日

Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR

Arxiv

0+阅读 · 2023年5月24日

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Arxiv

0+阅读 · 2023年5月24日

Adversarial Demonstration Attacks on Large Language Models

Arxiv

0+阅读 · 2023年5月24日

A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification

Arxiv

0+阅读 · 2023年5月24日

Hierarchical Prompting Assists Large Language Model on Web Navigation

Arxiv

0+阅读 · 2023年5月23日

Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks

Arxiv

0+阅读 · 2023年5月23日

MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks

Arxiv

0+阅读 · 2023年5月23日

Narrative XL: A Large-scale Dataset For Long-Term Memory Models

Arxiv

0+阅读 · 2023年5月23日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

相关基金

SMYD3调控Wnt/β-catenin信号通路的分子机制及其在肝细胞癌中功能的研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-5591靶向AGER/ROS/JNK抑制MSCs氧化应激损伤在糖尿病创面修复中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

DOT1介导的H3K79甲基化修饰的调节机制

国家自然科学基金

0+阅读 · 2014年12月31日

维甲酸类似物抑制PI3K/Akt信号通路逆转早幼粒白血病细胞维甲酸耐药的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

β-catenin/Ets1复合体在胶质母细胞瘤中对hTERT表达调控机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

β2-AR/PKA通路在内皮祖细胞修复急性肾损伤中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

胆盐（GCDA）诱导肝癌细胞生存与耐药的信号通路研究

国家自然科学基金

0+阅读 · 2013年12月31日

空间辐射和微重力影响dauer期线虫DNA损伤修复的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Stra8及其相互作用蛋白Setd8在精子发生中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer和Drosha基因遗传变异与膀胱癌易感性及其机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员