Content-defined Chunking (CDC) algorithms dictate the overall space savings that deduplication systems achieve. However, due to their need to scan each file in its entirety, they are slow and often the main performance bottleneck within data deduplication. We present VectorCDC, a method to accelerate hashless CDC algorithms using vector CPU instructions, such as SSE / AVX. We analyzed the state-of-the-art chunking algorithms and discovered that hashless algorithms primarily use two data processing patterns to identify chunk boundaries: Extreme Byte Searches and Range Scans. VectorCDC presents a vector-friendly approach to accelerate these two patterns. Using VectorCDC, we accelerated three state-of-the-art hashless chunking algorithms: RAM, AE, and MAXP. Our evaluation shows that VectorCDC is effective on Intel, AMD, ARM, and IBM CPUs, achieving 8.35x - 26.2x higher throughput than existing vector-accelerated algorithms, and 15.3x - 207.2x higher throughput than existing unaccelerated algorithms. VectorCDC achieves this without affecting the deduplication space savings.
翻译:内容定义分块算法决定了去重系统所能实现的总体空间节省效果。然而,由于这些算法需要对每个文件进行完整扫描,其处理速度缓慢,并常常成为数据去重过程中的主要性能瓶颈。本文提出VectorCDC方法,该方法利用SSE/AVX等向量CPU指令来加速无哈希分块算法。我们分析了当前最先进的分块算法,发现无哈希算法主要采用两种数据处理模式来识别分块边界:极值字节搜索和范围扫描。VectorCDC提出了一种向量友好的方法来加速这两种模式。通过应用VectorCDC,我们加速了三种最先进的无哈希分块算法:RAM、AE和MAXP。实验评估表明,VectorCDC在Intel、AMD、ARM和IBM处理器上均表现优异,其吞吐量比现有向量加速算法提高8.35倍至26.2倍,比现有非加速算法提高15.3倍至207.2倍。VectorCDC在实现这些性能提升的同时,完全保持了原有的去重空间节省率。