Hardware prefetching is one of the most widely-used techniques for hiding long data access latency. To address the challenges faced by hardware prefetching, architects have proposed to detect and exploit the spatial locality at the granularity of spatial region. When a new region is activated, they try to find similar previously accessed regions for footprint prediction based on system-level environmental features such as the trigger instruction or data address. However, we find that such context-based prediction cannot capture the essential characteristics of access patterns, leading to limited flexibility, practicality and suboptimal prefetching performance. In this paper, inspired by the temporal property of memory accessing, we note that the temporal correlation exhibited within the spatial footprint is a key feature of spatial patterns. To this end, we propose Gaze, a simple and efficient hardware spatial prefetcher that skillfully utilizes footprint-internal temporal correlations to efficiently characterize spatial patterns. Meanwhile, we observe a unique unresolved challenge in utilizing spatial footprints generated by spatial streaming, which exhibit extremely high access density. Therefore, we further enhance Gaze with a dedicated two-stage approach that mitigates the over-prefetching problem commonly encountered in conventional schemes. Our comprehensive and diverse set of experiments show that Gaze can effectively enhance the performance across a wider range of scenarios. Specifically, Gaze improves performance by 5.7\% and 5.4\% at single-core, 11.4\% and 8.8\% at eight-core, compared to most recent low-cost solutions PMP and vBerti.
翻译:硬件预取是隐藏长数据访问延迟最广泛应用的技术之一。为应对硬件预取面临的挑战,架构师提出在空间区域粒度上检测并利用空间局部性。当新区域被激活时,他们尝试基于触发指令或数据地址等系统级环境特征,寻找先前访问过的相似区域以进行足迹预测。然而,我们发现此类基于上下文的预测无法捕捉访问模式的核心特征,导致灵活性有限、实用性不足及预取性能欠佳。本文受内存访问时序特性的启发,指出空间足迹内部展现的时序相关性是空间模式的关键特征。为此,我们提出Gaze——一种简单高效的硬件空间预取器,其巧妙利用足迹内部的时序相关性来有效表征空间模式。同时,我们观察到利用空间流生成的空间足迹时存在一个独特的未解挑战:这些足迹表现出极高的访问密度。因此,我们进一步采用专用的两阶段方法增强Gaze,以缓解传统方案中常见的过度预取问题。我们全面且多样化的实验表明,Gaze能在更广泛场景中有效提升性能。具体而言,相较于最新的低成本解决方案PMP和vBerti,Gaze在单核配置下性能提升5.7%和5.4%,在八核配置下提升11.4%和8.8%。