As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application.
翻译:随着半导体功率密度不再随工艺制程微缩保持恒定,现代CPU开始集成高性能数据加速器,旨在提升各类应用与场景的性能和效率。英特尔第四代至强可扩展CPU(Sapphire Rapids)中引入的英特尔数据流加速器(DSA)便是此类加速器之一。DSA专注于内存中的数据搬移操作,这类操作是数据中心工作负载及基础设施中常见的性能瓶颈。此外,通过支持流式数据上更广泛的操作,例如CRC32计算、增量记录创建/合并、数据完整性域(DIF)操作等,DSA的通用性得到了显著增强。本文旨在介绍DSA所支持的最新特性,深入探讨其通用性,并通过全面评估分析其吞吐量优势。结合对DSA特性的分析及其丰富的软件生态,我们总结了若干见解与使用指南,帮助程序员充分挖掘DSA的性能潜力,并利用DPDK Vhost的深入案例研究,展示这些指南如何使实际应用受益。