This paper investigates the architectural features and performance potential of the Apple Silicon M-Series SoCs (M1, M2, M3, and M4) for HPC. We provide a detailed review of the CPU and GPU designs, the unified memory architecture, and coprocessors such as Advanced Matrix Extensions (AMX). We design and develop benchmarks in the Metal Shading Language and Objective-C++ to assess computational and memory performance. We also measure power consumption and efficiency using Apple's powermetrics tool. Our results show that M-Series chips offer relatively high memory bandwidth and significant improvements in computational performance, particularly with the GPU outperforming the CPU from the M2 onward, peaking at 2.9 FP32 TFLOPS for the M4. Power consumption varies from a few watts to 10-20 watts, with more than 200 GFLOPS per Watt efficiency of GPU and accelerator reached by all four chips. Despite limitations in FP64 support on the GPU, the M-Series chips demonstrate strong potential for energy-efficient HPC applications. Our analysis examines whether the M-Series chips provide a competitive alternative to traditional HPC architectures or represent a distinct category altogether -- an apples-to-oranges comparison.
翻译:本文研究了Apple Silicon M系列SoC(M1、M2、M3及M4)面向高性能计算的架构特性与性能潜力。我们对CPU与GPU设计、统一内存架构以及高级矩阵扩展(AMX)等协处理器进行了详细评述。我们使用Metal着色语言和Objective-C++设计并开发了基准测试程序,以评估计算与内存性能。同时,我们借助Apple的powermetrics工具测量了功耗与能效。结果表明,M系列芯片提供了相对较高的内存带宽和显著的计算性能提升,特别是从M2开始GPU性能超越CPU,M4的FP32峰值算力达到2.9 TFLOPS。功耗范围在数瓦至10-20瓦之间,四款芯片均实现了GPU与加速器每瓦超过200 GFLOPS的能效表现。尽管GPU对FP64的支持存在局限,但M系列芯片在能效敏感型高性能计算应用中展现出巨大潜力。我们的分析探讨了M系列芯片究竟是传统高性能计算架构的有力竞争者,还是代表了完全不同的技术路线——这本质上是一种苹果与橙子式的比较。