Data movement is a key bottleneck in terms of both performance and energy efficiency in modern HPC systems. The NEC SX-series supercomputers have a long history of accelerating memory-intensive HPC applications by providing sufficient memory bandwidth to applications. In this paper, we analyze the performance of a prototype SX-Aurora TSUBASA supercomputer equipped with the brand-new Vector Engine (VE30) processor. VE30 is the first major update to the Vector Engine processor series, and offers significantly improved memory access performance due to its renewed memory subsystem. Moreover, it introduces new instructions and incorporates architectural advancements tailored for accelerating memory-intensive applications. Using standard benchmarks, we demonstrate that VE30 considerably outperforms other processors in both performance and efficiency of memory-intensive applications. We also evaluate VE30 using applications including SPEChpc, and show that VE30 can run real-world applications with high performance. Finally, we discuss performance tuning techniques to obtain maximum performance from VE30.
翻译:数据迁移是现代高性能计算系统中性能和能效的关键瓶颈。NEC SX系列超级计算机通过提供充足的内存带宽来加速内存密集型高性能计算应用,有着悠久的历史。本文分析了配备全新向量引擎(VE30)处理器的原型SX-Aurora TSUBASA超级计算机的性能。VE30是向量引擎处理器系列的首次重大更新,因其重新设计的内存子系统而显著提升了内存访问性能。此外,它引入了新指令,并融合了专为加速内存密集型应用而设计的架构改进。通过标准基准测试,我们证明了VE30在内存密集型应用的性能和效率方面均显著优于其他处理器。我们还使用包括SPEChpc在内的应用对VE30进行了评估,表明VE30能够以高性能运行实际应用。最后,我们讨论了从VE30获得最大性能的性能调优技术。