SGDP: A Stream-Graph Neural Network Based Data Prefetcher

Data prefetching is important for storage system optimization and access performance improvement. Traditional prefetchers work well for mining access patterns of sequential logical block address (LBA) but cannot handle complex non-sequential patterns that commonly exist in real-world applications. The state-of-the-art (SOTA) learning-based prefetchers cover more LBA accesses. However, they do not adequately consider the spatial interdependencies between LBA deltas, which leads to limited performance and robustness. This paper proposes a novel Stream-Graph neural network-based Data Prefetcher (SGDP). Specifically, SGDP models LBA delta streams using a weighted directed graph structure to represent interactive relations among LBA deltas and further extracts hybrid features by graph neural networks for data prefetching. We conduct extensive experiments on eight real-world datasets. Empirical results verify that SGDP outperforms the SOTA methods in terms of the hit ratio by 6.21%, the effective prefetching ratio by 7.00%, and speeds up inference time by 3.13X on average. Besides, we generalize SGDP to different variants by different stream constructions, further expanding its application scenarios and demonstrating its robustness. SGDP offers a novel data prefetching solution and has been verified in commercial hybrid storage systems in the experimental phase. Our codes and appendix are available at https://github.com/yyysjz1997/SGDP/.

翻译：数据预取对于存储系统优化和访问性能提升至关重要。传统预取器在挖掘顺序逻辑块地址（LBA）的访问模式方面表现良好，但无法处理实际应用中普遍存在的复杂非顺序模式。现有基于学习的先进预取器能够覆盖更多LBA访问，然而它们未能充分考虑LBA增量之间的空间相互依赖关系，导致性能和鲁棒性有限。本文提出了一种新颖的基于流图神经网络的数据预取器（SGDP）。具体而言，SGDP利用加权有向图结构对LBA增量流进行建模，以表示LBA增量间的交互关系，并进一步通过图神经网络提取混合特征用于数据预取。我们在八个真实数据集上进行了大量实验。实验结果表明，SGDP在命中率上平均提升6.21%，有效预取率提升7.00%，推理速度加快3.13倍，均优于现有方法。此外，我们通过不同的流构造将SGDP泛化为多种变体，进一步扩展了其应用场景并证明了其鲁棒性。SGDP提供了一种新颖的数据预取解决方案，并已在商业混合存储系统的实验阶段得到验证。我们的代码和附录可在https://github.com/yyysjz1997/SGDP/ 获取。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日