Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications is contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this paper, we systematically study LLMs on multiple AI accelerators and GPUs and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT- 2 model, and (iii) an LLM-driven science use case, GenSLM. We present our findings and analyses of the models' performance to better understand the intrinsic capabilities of AI accelerators. Furthermore, our analysis takes into account key factors such as sequence lengths, scaling behavior, sparsity, and sensitivity to gradient accumulation steps.
翻译:人工智能方法在科学应用中已成为加速科学发现的关键工具。大型语言模型凭借其在各领域间的卓越泛化能力,正被视为解决诸多挑战性问题的有前景方案。模型的有效性与应用精度取决于其在底层硬件基础设施上的高效执行。近年来,专用AI加速硬件系统已开始应用于加速人工智能应用。然而,这些AI加速器在大语言模型上的性能对比此前尚未得到系统研究。本文在多款AI加速器与GPU上对大型语言模型进行了系统研究,评估了这些系统处理此类模型的性能特征。我们通过以下三个维度对这些系统进行评测:(一)基于核心Transformer模块的微基准测试;(二)GPT-2模型;(三)LLM驱动的科学应用案例GenSLM。本文通过呈现模型性能的分析结果,旨在深入理解AI加速器的内在能力。此外,我们的分析综合考虑了序列长度、扩展行为、稀疏性及梯度累积步数敏感性等关键因素。