Reed-Solomon (RS) codes have been increasingly adopted by distributed storage systems in place of replication,because they provide the same level of availability with much lower storage overhead. However, a key drawback of those RS-coded distributed storage systems is the poor latency of degraded reads, which can be incurred by data failures or hot spots,and are not rare in production environments. To address this issue, we propose a novel parallel reconstruction solution called APLS. APLS leverages all surviving source nodes to send the data needed by degraded reads and chooses light-loaded starter nodes to receive the reconstructed data of those degraded reads. Hence, the latency of the degraded reads can be improved.Prototyping-based experiments are conducted to compare APLS with ECPipe, the state-of-the-art solution of improving the latency of degraded reads. The experimental results demonstrate that APLS effectively reduces the latency, particularly under heavy or medium workloads.
翻译:里德-所罗门(RS)码已被分布式存储系统日益广泛地采用以替代复制方案,因为它们在提供相同可用性水平的同时大幅降低了存储开销。然而,这类RS编码分布式存储系统的关键缺点在于降级读操作的延迟性能较差——这种延迟可能由数据故障或热点问题引发,且在生产环境中并不罕见。为解决此问题,我们提出了一种新型并行重构方案APLS。APLS利用所有存活源节点发送降级读所需的数据,并选择轻负载启动节点接收这些降级读的重构数据,从而改善降级读操作的延迟。我们开展了基于原型系统的实验,将APLS与当前提升降级读延迟的先进方案ECPipe进行对比。实验结果表明,APLS能有效降低延迟,特别是在重负载或中等负载工况下。