Simulators are a primary tool in computer architecture research but are extremely computationally intensive. Simulating modern architectures with increased core counts and recent workloads can be challenging, even on modern hardware. This paper demonstrates that simulating some GPGPU workloads in a single-threaded state-of-the-art simulator such as Accel-sim can take more than five days. In this paper we present a simple approach to parallelize this simulator with minimal code changes by using OpenMP. Moreover, our parallelization technique is deterministic, so the simulator provides the same results for single-threaded and multi-threaded simulations. Compared to previous works, we achieve a higher speed-up, and, more importantly, the parallel simulation does not incur any inaccuracies. When we run the simulator with 16 threads, we achieve an average speed-up of 5.8x and reach 14x in some workloads. This allows researchers to simulate applications that take five days in less than 12 hours. By speeding up simulations, researchers can model larger systems, simulate bigger workloads, add more detail to the model, increase the efficiency of the hardware platform where the simulator is run, and obtain results sooner.
翻译:模拟器是计算机体系结构研究的主要工具,但计算量极大。模拟具有更多核心数量和现代工作负载的现代体系结构可能具有挑战性,即使在现代硬件上也是如此。本文证明,在诸如Accel-sim这样的单线程先进模拟器中模拟某些GPGPU工作负载可能需要超过五天时间。在本文中,我们提出了一种通过使用OpenMP以最少的代码更改来并行化此模拟器的简单方法。此外,我们的并行化技术是确定性的,因此模拟器在单线程和多线程模拟中提供相同的结果。与先前的工作相比,我们实现了更高的加速比,并且更重要的是,并行模拟不会导致任何不准确性。当我们使用16个线程运行模拟器时,我们实现了平均5.8倍的加速比,在某些工作负载中达到14倍。这使得研究人员能够在不到12小时内模拟原本需要五天的应用程序。通过加速模拟,研究人员可以模拟更大的系统、运行更大的工作负载、为模型添加更多细节、提高运行模拟器的硬件平台的效率,并更快地获得结果。