As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application.
翻译:随着半导体功率密度在工艺节点缩小下不再保持恒定,现代CPU开始集成高性能数据加速器,旨在提升各类应用场景的性能与能效。英特尔第四代至强可扩展CPU(Sapphire Rapids)中引入的数据流加速器(DSA)即是此类加速器之一。DSA专门处理内存中的数据移动操作,这类操作通常是数据中心工作负载与基础设施中的主要开销来源。此外,通过支持更广泛的流式数据处理操作(如CRC32计算、增量记录创建/合并及数据完整性字段(DIF)操作),DSA展现出更强的通用性。本文旨在介绍DSA支持的最新特性,深入分析其通用性,并通过全面评估量化其吞吐量优势。结合DSA的特性分析及其丰富的软件生态,我们为开发人员总结出若干优化建议与使用指南,并以DPDK Vhost深度案例研究为例,展示这些指南如何在实际应用中落地生效。