There are multiple performance metrics in the design of coding schemes for distributed storage systems. The first metric is called repair bandwidth, which measures the network resources required during the repair process. Another critical metric for repair efficiency is disk I/O cost, defined as the amount of data packets accessed at helper nodes to repair the failed node. In an encoding scheme with optimal I/O cost, the number of packets sent to the newcomer is exactly the same as the number of packets read from memory. This mode of repair is referred to as uncoded repair, as no coding operations are performed at the helper node. In addition to minimizing disk I/O cost, an uncoded repair mechanism has the advantage of incurring minimal computational overhead at the helper node. In this paper, we demonstrate that for single node failures, if all surviving nodes participate in the repair of the failed node, we can achieve all points on the fundamental tradeoff curve between storage and repair bandwidth. The design of the proposed encoding scheme is based on the theory of gammoids, a specialized class of graph-based matroids. We prove that this scheme can tolerate an unlimited number of node repair iterations over a field of fixed size.
翻译:分布式存储系统编码方案的设计涉及多种性能指标。首要指标称为修复带宽,用于衡量修复过程中所需的网络资源。另一个影响修复效率的关键指标是磁盘I/O开销,其定义为修复故障节点时需从辅助节点访问的数据包总量。在具有最优I/O开销的编码方案中,发送至新节点的数据包数量与从内存读取的数据包数量完全相同。这种修复模式被称为无编码修复,因为辅助节点无需执行编码操作。除了最小化磁盘I/O开销外,无编码修复机制还具有辅助节点计算开销极低的优势。本文证明,对于单节点故障,若所有存活节点均参与故障节点修复,我们能够实现存储与修复带宽基本权衡曲线上的所有工作点。所提编码方案的设计基于伽莫德理论——一类特殊的基于图的拟阵。我们证明该方案能在固定规模的有限域上容忍无限次节点修复迭代。