As the semiconductor power density no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application.
翻译:随着半导体功率密度不再随制程工艺微缩而保持恒定,现代CPU开始在芯片内集成高性能数据加速器,旨在提升各类应用与使用场景的性能及能效。这类加速器之一便是英特尔第四代至强可扩展CPU(Sapphire Rapids)所引入的数据流加速器(Intel DSA)。DSA专注于内存中的数据移动操作,这些操作通常是数据中心工作负载及基础设施中开销的主要来源。此外,通过支持更广泛的流式数据处理操作(如CRC32计算、增量记录创建/合并及数据完整性字段操作),DSA的通用性显著增强。本文旨在介绍DSA支持的最新特性、深入剖析其多功能性,并通过全面评估分析其吞吐量优势。结合对其特性的分析及丰富的DSA软件生态系统,我们归纳了若干洞察与编程指南,帮助开发者最大化利用DSA,并通过DPDK Vhost的深度案例研究,展示这些指南如何惠及实际应用。