Large language models (LLMs) have garnered significant attention in both the AI community and beyond. Among these, the Generative Pre-trained Transformer (GPT) has emerged as the dominant architecture, spawning numerous variants. However, these variants have undergone pre-training under diverse conditions, including variations in input data, data preprocessing, and training methodologies, resulting in a lack of controlled comparative studies. Here we meticulously examine two prominent open-sourced GPT architectures, GPT-NeoX and LLaMA, leveraging the computational power of Frontier, the world's first Exascale supercomputer. Employing the same materials science text corpus and a comprehensive end-to-end pipeline, we conduct a comparative analysis of their training and downstream performance. Our efforts culminate in achieving state-of-the-art performance on a challenging materials science benchmark. Furthermore, we investigate the computation and energy efficiency, and propose a computationally efficient method for architecture design. To our knowledge, these pre-trained models represent the largest available for materials science. Our findings provide practical guidance for building LLMs on HPC platforms.
翻译:大语言模型(LLM)在人工智能学界及更广泛的领域中已引起显著关注。其中,生成式预训练Transformer(GPT)已成为主流架构,并衍生出众多变体。然而,这些变体在不同条件下进行预训练,包括输入数据、数据预处理和训练方法的差异,导致缺乏受控的比较研究。本文利用全球首台百亿亿次超级计算机Frontier的计算能力,对两种著名的开源GPT架构——GPT-NeoX和LLaMA——进行了细致研究。采用相同的材料科学文本语料库和完整的端到端流水线,我们对它们的训练和下游性能进行了比较分析。最终在具有挑战性的材料科学基准测试中取得了最优性能。此外,我们研究了计算效率与能效,并提出了一种计算高效的架构设计方法。据我们所知,这些预训练模型是材料科学领域可获取的最大规模。我们的研究结果为在高性能计算平台上构建大语言模型提供了实践指导。