We consider replication-based distributed storage systems in which each node stores the same quantum of data and each data bit stored has the same replication factor across the nodes. Such systems are referred to as balanced distributed databases. When existing nodes leave or new nodes are added to this system, the balanced nature of the database is lost, either due to the reduction in the replication factor, or the non-uniformity of the storage at the nodes. This triggers a rebalancing algorithm, that exchanges data between the nodes so that the balance of the database is reinstated. The goal is then to design rebalancing schemes with minimal communication load. In a recent work by Krishnan et al., coded transmissions were used to rebalance a carefully designed distributed database from a node removal or addition. These coded rebalancing schemes have optimal communication load, however, require the file-size to be at least exponential in the system parameters. In this work, we consider a cyclic balanced database (where data is cyclically placed in the system nodes) and present coded rebalancing schemes for node removal and addition in such a database. These databases (and the associated rebalancing schemes) require the file-size to be only cubic in the number of nodes in the system. We bound the advantage of our node removal rebalancing scheme over the uncoded scheme, and show that our scheme has a smaller communication load. In the node addition scenario, the rebalancing scheme presented is a simple uncoded scheme, which we show has optimal load. Finally, we derive a lower bound for the single node-removal rebalancing for the specific choice of data placements specified by our achievable rebalancing schemes, and show that our achievable rebalancing loads are within a multiplicative gap from the lower bound obtained.
翻译:我们考虑基于复制的分布式存储系统,其中每个节点存储相同的数据量,且每个存储的数据位在节点间具有相同的复制因子。此类系统被称为平衡分布式数据库。当现有节点离开或新节点加入系统时,数据库的平衡性将被破坏,其原因可能是复制因子降低或节点间存储不均匀。这将触发再平衡算法,通过节点间的数据交换来恢复数据库的平衡。目标在于设计具有最小通信负载的再平衡方案。在 Krishnan 等人最近的工作中,通过编码传输对精心设计的分布式数据库进行节点移除或添加后的再平衡。这些编码再平衡方案具有最优通信负载,但要求文件大小至少为系统参数的指数级。本文考虑循环平衡数据库(其中数据以循环方式放置于系统节点中),并为此类数据库中的节点移除和添加提出了编码再平衡方案。这些数据库(及相关再平衡方案)仅要求文件大小为系统节点数的三次方。我们界定了所提节点移除再平衡方案相对于非编码方案的优势,并证明该方案具有更小的通信负载。在节点添加场景中,所提出的再平衡方案为简单的非编码方案,我们证明其具有最优负载。最后,针对我们可实现的再平衡方案所规定的特定数据放置选择,推导了单节点移除再平衡的下界,并证明我们可实现的再平衡负载与所得下界之间仅存在乘法性差距。