Modern computer designs support composite prefetching, where multiple individual prefetcher components are used to target different memory access patterns. However, multiple prefetchers competing for resources can drastically hurt performance, especially in many-core systems where cache and other resources are shared and very limited. Prior work has proposed mitigating this issue by selectively enabling and disabling prefetcher components during runtime. Traditional approaches proposed heuristics that are hard to scale with increasing core and prefetcher component counts. More recently, deep reinforcement learning was proposed. However, it is too expensive to deploy in real-world many-core systems. In this work, we propose a new phase-based methodology for training a lightweight supervised learning model to manage composite prefetchers at runtime. Our approach improves the performance of a state-of-the-art many-core system by up to 25% and by 2.7% on average over its default prefetcher configuration.
翻译:现代计算机设计支持复合预取技术,即使用多个独立的预取器组件来针对不同的内存访问模式。然而,多个预取器竞争资源可能严重损害性能,尤其是在缓存和其他资源共享且极为有限的众核系统中。先前的研究提出通过运行时选择性启用和禁用预取器组件来缓解这一问题。传统方法所提出的启发式策略难以随着核心和预取器组件数量的增加而扩展。最近,深度强化学习方法被提出,但其在实际众核系统中的部署成本过高。在本工作中,我们提出了一种新的基于阶段的方法,用于训练轻量级监督学习模型以在运行时管理复合预取器。与默认预取器配置相比,我们的方法将最先进的众核系统性能最高提升25%,平均提升2.7%。