Graph-related applications have experienced significant growth in academia and industry, driven by the powerful representation capabilities of graph. However, efficiently executing these applications faces various challenges, such as load imbalance, random memory access, etc. To address these challenges, researchers have proposed various acceleration systems, including software frameworks and hardware accelerators, all of which incorporate graph pre-processing (GPP). GPP serves as a preparatory step before the formal execution of applications, involving techniques such as sampling, reorder, etc. However, GPP execution often remains overlooked, as the primary focus is directed towards enhancing graph applications themselves. This oversight is concerning, especially considering the explosive growth of real-world graph data, where GPP becomes essential and even dominates system running overhead. Furthermore, GPP methods exhibit significant variations across devices and applications due to high customization. Unfortunately, no comprehensive work systematically summarizes GPP. To address this gap and foster a better understanding of GPP, we present a comprehensive survey dedicated to this area. We propose a double-level taxonomy of GPP, considering both algorithmic and hardware perspectives. Through listing relavent works, we illustrate our taxonomy and conduct a thorough analysis and summary of diverse GPP techniques. Lastly, we discuss challenges in GPP and potential future directions.
翻译:图相关应用在学术界和工业界经历了显著增长,这得益于图强大的表征能力。然而,高效执行这些应用面临着诸多挑战,如负载不均、随机内存访问等。为解决这些问题,研究人员提出了各种加速系统,包括软件框架和硬件加速器,这些系统都集成了图预处理(GPP)。GPP作为应用正式执行前的准备步骤,涉及采样、重排序等技术。然而,由于主要焦点集中在提升图应用本身,GPP的执行常常被忽视。这种忽视令人担忧,尤其考虑到真实世界图数据的爆炸性增长,此时GPP变得不可或缺,甚至主导系统运行开销。此外,由于高度定制化,GPP方法在不同的设备和应用间表现出显著差异。遗憾的是,目前尚无综合性工作系统总结GPP。为弥补这一空白并促进对GPP的更好理解,我们呈现了一篇专门面向该领域的全面综述。我们提出了GPP的双层分类法,同时考虑算法和硬件视角。通过列举相关研究工作,我们阐释了所提出的分类法,并对多样化的GPP技术进行了深入分析与总结。最后,我们讨论了GPP面临的挑战及潜在未来方向。