并行化现代GPU模拟器 (Parallelizing a modern GPU simulator)

Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This paper demonstrates that simulating some GPGPU workloads in a single-threaded state-of-the-art simulator such as Accel-sim can take more than five days. In this paper we present a simple approach to parallelize this simulator with minimal code changes by using OpenMP. Moreover, our parallelization technique is deterministic, so the simulator provides the same results for single-threaded and multi-threaded simulations. Compared to previous works, we achieve a higher speed-up, and, more importantly, the parallel simulation does not incur any inaccuracies. When we run the simulator with 16 threads, we achieve an average speed-up of 5.8x and reach 14x in some workloads. This allows researchers to simulate applications that take five days in less than 12 hours. By speeding up simulations, researchers can model larger systems, simulate bigger workloads, add more detail to the model, increase the efficiency of the hardware platform where the simulator is run, and obtain results sooner.

翻译：模拟器是计算机体系结构研究的主要工具，但计算量极大。模拟具有更多核心数量和现代工作负载的现代体系结构可能具有挑战性，即使在现代硬件上也是如此。本文证明，在诸如Accel-sim这样的单线程先进模拟器中模拟某些GPGPU工作负载可能需要超过五天时间。在本文中，我们提出了一种通过使用OpenMP以最少的代码更改来并行化此模拟器的简单方法。此外，我们的并行化技术是确定性的，因此模拟器在单线程和多线程模拟中提供相同的结果。与先前的工作相比，我们实现了更高的加速比，并且更重要的是，并行模拟不会导致任何不准确性。当我们使用16个线程运行模拟器时，我们实现了平均5.8倍的加速比，在某些工作负载中达到14倍。这使得研究人员能够在不到12小时内模拟原本需要五天的应用程序。通过加速模拟，研究人员可以模拟更大的系统、运行更大的工作负载、为模型添加更多细节、提高运行模拟器的硬件平台的效率，并更快地获得结果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日