Data prefetching is a critical technique for bridging the processor-memory performance gap by predicting future memory accesses and retrieving data into on-chip caches before demand. While traditional prefetchers based on next-line, stride, and correlation heuristics perform well for regular access patterns, they are fundamentally inadequate for the irregular, data-dependent patterns prevalent in modern workloads such as graph analytics, sparse matrix computations, and pointer-intensive applications. This survey presents a systematic review of papers using a PRISMA-guided selection methodology. We propose a structured taxonomy that organizes prefetching techniques across three dimensions: locality type, including spatial and temporal locality; implementation layer, including hardware, software, and hybrid approaches; and, for the increasingly important class of ML-based prefetchers, learning paradigm, including supervised, reinforcement, and unsupervised learning, paired with training mode, including online and offline training. Through a multi-dimensional comparative analysis of ML-based prefetchers evaluated across storage overhead, accuracy, inference latency, hardware feasibility, and generalization ability, we identify three key findings: an accuracy-overhead Pareto frontier defined by model class, a natural architectural mapping between model complexity and cache hierarchy level, and a fundamental tension between runtime adaptability and model capacity that motivates hierarchical ensemble architectures.
翻译:数据预取是通过预测未来内存访问并在需求之前将数据检索到片上缓存中,从而弥合处理器与内存性能差距的关键技术。尽管基于下一行、跨步和关联性启发式方法的传统预取器在规则访问模式下表现良好,但对于图分析、稀疏矩阵计算和指针密集型应用等现代工作负载中普遍存在的不规则、数据依赖型模式,它们根本上存在不足。本综述采用PRISMA指导的文献筛选方法,对相关论文进行了系统性回顾。我们提出一个结构化的分类体系,该体系从三个维度组织预取技术:局部性类型,包括空间局部性和时间局部性;实现层次,包括硬件、软件和混合方法;以及针对日益重要的基于机器学习的预取器的学习范式,包括监督学习、强化学习和无监督学习,并辅以训练模式,包括在线训练和离线训练。通过对基于机器学习的预取器在存储开销、准确率、推理延迟、硬件可行性和泛化能力方面的多维度比较分析,我们识别出三个关键发现:由模型类别定义的准确率-开销帕累托前沿、模型复杂度与缓存层次结构之间的自然架构映射,以及运行时自适应性与模型容量之间的根本性矛盾,这催生了层次化集成架构的出现。