Worldwide, storage demands and costs are increasing. As a consequence of fault tolerance, storage device heterogenity, and data center specific constraints, optimal storage capacity utilization cannot be achieved with the integrated balancing algorithm of the distributed storage cluster system Ceph. This work presents Equilibrium, a device utilization size-aware shard balancing algorithm. With extensive experiments we demonstrate that our proposed algorithm balances near optimally on real-world clusters with strong available storage capacity improvements while reducing the amount of needed data movement.
翻译:全球范围内,存储需求与成本持续攀升。由于容错机制、存储设备异构性以及数据中心特定约束,分布式存储集群系统Ceph内置的平衡算法无法实现最优存储容量利用率。本文提出Equilibrium算法——一种基于设备利用率大小感知的分片平衡算法。通过大量实验证明,我们提出的算法在实际集群中实现了接近最优的平衡效果,显著提升了可用存储容量,同时减少了所需的数据迁移量。