Heterogeneous supercomputers have become the standard in HPC. GPUs in particular have dominated the accelerator landscape, offering unprecedented performance in parallel workloads and unlocking new possibilities in fields like AI and climate modeling. With many workloads becoming memory-bound, improving the communication latency and bandwidth within the system has become a main driver in the development of new architectures. The Grace Hopper Superchip (GH200) is a significant step in the direction of tightly coupled heterogeneous systems, in which all CPUs and GPUs share a unified address space and support transparent fine grained access to all main memory on the system. We characterize both intra- and inter-node memory operations on the Quad GH200 nodes of the new Swiss National Supercomputing Centre Alps supercomputer, and show the importance of careful memory placement on example workloads, highlighting tradeoffs and opportunities.
翻译:异构超级计算机已成为高性能计算(HPC)领域的标准。特别是GPU,已在加速器领域占据主导地位,为并行工作负载提供了前所未有的性能,并在人工智能和气候建模等领域开辟了新的可能性。随着许多工作负载变得受内存限制,改善系统内的通信延迟和带宽已成为新架构发展的主要驱动力。Grace Hopper超级芯片(GH200)是迈向紧密耦合异构系统的重要一步,在该系统中,所有CPU和GPU共享统一的地址空间,并支持对系统上所有主内存的透明细粒度访问。我们以瑞士国家超级计算中心新型Alps超级计算机的Quad GH200节点为例,分析了节点内和节点间的内存操作,并通过示例工作负载展示了精细内存布局的重要性,同时揭示了其中的权衡与机遇。