The rapid advancement of machine learning (ML) technologies has driven the development of specialized hardware accelerators designed to facilitate more efficient model training. This paper introduces the CARAML benchmark suite, which is employed to assess performance and energy consumption during the training of transformer-based large language models and computer vision models on a range of hardware accelerators, including systems from NVIDIA, AMD, and Graphcore. CARAML provides a compact, automated, extensible, and reproducible framework for assessing the performance and energy of ML workloads across various novel hardware architectures. The design and implementation of CARAML, along with a custom power measurement tool called jpwr, are discussed in detail.
翻译:机器学习技术的快速发展推动了专用硬件加速器的开发,旨在实现更高效的模型训练。本文介绍了CARAML基准测试套件,该套件用于评估基于Transformer的大语言模型和计算机视觉模型在多种硬件加速器(包括NVIDIA、AMD和Graphcore的系统)上的训练性能与能耗。CARAML提供了一个紧凑、自动化、可扩展且可复现的框架,用于评估不同新型硬件架构上机器学习工作负载的性能与能耗。本文详细讨论了CARAML的设计与实现,以及一个名为jpwr的自定义功耗测量工具。