GPU architectures have become popular for executing general-purpose programs. Their many-core architecture supports a large number of threads that run concurrently to hide the latency among dependent instructions. In modern GPU architectures, each SM/core is typically composed of several sub-cores, where each sub-core has its own independent pipeline. Simulators are a key tool for investigating novel concepts in computer architecture. They must be performance-accurate and have a proper model related to the target hardware to explore the different bottlenecks properly. This paper presents a wide analysis of different parts of Accel-sim, a popular GPGPU simulator, and some improvements of its model. First, we focus on the front-end and developed a more realistic model. Then, we analyze the way the result bus works and develop a more realistic one. Next, we describe the current memory pipeline model and propose a model for a more cost-effective design. Finally, we discuss other areas of improvement of the simulator.
翻译:GPU架构已广泛应用于通用程序的执行。其多核架构支持大量线程并发运行,以隐藏依赖指令间的延迟。在现代GPU架构中,每个流多处理器/核心通常由若干子核心构成,每个子核心拥有独立的流水线。模拟器是研究计算机体系结构新概念的关键工具。它们必须保证性能准确性,并建立与目标硬件相匹配的精确模型,以正确探索不同瓶颈。本文对流行GPGPU模拟器Accel-Sim的多个模块进行了广泛分析,并提出了模型改进方案。首先,我们聚焦前端部分,开发了更贴近实际的模型。其次,分析了结果总线的工作机制,并设计了更逼真的模型。随后,描述了当前内存流水线模型,并针对更具成本效益的设计提出了新模型。最后,探讨了模拟器其他可改进的领域。