Scaling Law论文 - 专知

会员服务 ·

Scaling Law

从目前的研究总结发现，模型规模的扩展是LLM能力提升的一个关键因素。从GPT-3的175B参数量到PaLM的540B记录，都验证了模型规模的扩展，导致能力的提升。当然，大的模型尺寸是必不可少的，但是扩展定律并不仅限于此，它一共包括三个方面：模型尺寸（Model size）数据规模（Data size）总计算量（Total compute）此外，预训练数据的质量在保证模型性能方面有着关键作用，因此在扩展语料库时，要注意数据收集和清理的策略。

Active Inference as the Test-Time Scaling Law for Physical AI Agents

Arxiv

0+阅读 · 6月22日

From Zipf's Law to Neural Scaling through Heaps' Law and Hilberg's Hypothesis

Arxiv

0+阅读 · 6月22日

When to use what Schatten-$p$ norm in deep learning?

Arxiv

0+阅读 · 6月13日

Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design

Arxiv

0+阅读 · 6月5日

Unified Neural Scaling Laws

Arxiv

0+阅读 · 5月25日

Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation

Arxiv

0+阅读 · 6月1日

Scaling Laws for Behavioral Foundation Models over User Event Sequences

Arxiv

0+阅读 · 6月3日

Unifying Learning Dynamics and Generalization in Transformers Scaling Law

Arxiv

0+阅读 · 6月10日

Asymmetric Scaling Laws from Sparse Features

Arxiv

0+阅读 · 5月22日

Active Budget Allocation for Efficient Scaling Law Estimation via Surrogate-Guided Pruning

Arxiv

0+阅读 · 5月17日

Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos

Arxiv

0+阅读 · 5月20日

Scaling Laws for Mixture Pretraining Under Data Constraints

Arxiv

0+阅读 · 5月15日

Allometric Scaling Laws for Bipedal Robots

Arxiv

0+阅读 · 4月6日

A Limit Theory of Foundation Models: A Mathematical Approach to Understanding Emergent Intelligence and Scaling Laws

Arxiv

0+阅读 · 4月28日

Modernizing Amdahl's Law: How AI Scaling Laws Shape Computer Architecture

Arxiv

0+阅读 · 3月27日

参考链接

微信扫码咨询专知VIP会员