Deep learning methods are transforming research, enabling new techniques, and ultimately leading to new discoveries. As the demand for more capable AI models continues to grow, we are now entering an era of Trillion Parameter Models (TPM), or models with more than a trillion parameters -- such as Huawei's PanGu-$\Sigma$. We describe a vision for the ecosystem of TPM users and providers that caters to the specific needs of the scientific community. We then outline the significant technical challenges and open problems in system design for serving TPMs to enable scientific research and discovery. Specifically, we describe the requirements of a comprehensive software stack and interfaces to support the diverse and flexible requirements of researchers.
翻译:深度学习方法正在改变科学研究的面貌,推动新技术的发展,并最终带来新的科学发现。随着对更强大人工智能模型需求的持续增长,我们正进入万亿参数模型时代——例如华为的盘古-Σ。本文勾勒了一个服务于科学界特定需求的万亿参数模型用户与提供者生态系统的愿景,并阐述了面向科学研究的TPM服务系统设计所面临的重大技术挑战与开放性问题。具体而言,我们描述了支撑研究者多样化与灵活需求所需的完整软件栈及接口要求。