Most large-scale storage systems employ erasure coding to provide resilience against disk failures. Recent work has shown that tuning this redundancy to changes in disk failure rates leads to substantial storage savings. This process requires code conversion, wherein data encoded using an $[n^{I\mskip-2mu},k^{I\mskip-2mu}]$ initial code has to be transformed into data encoded using an $[n^{F\mskip-2mu},k^{F\mskip-2mu}]$ final code, a resource-intensive operation. Convertible codes are a class of codes that enable efficient code conversion while maintaining other desirable properties. In this paper, we focus on the access cost of conversion (total number of code symbols accessed in the conversion process) and on an important subclass of conversions known as the merge regime (combining multiple initial codewords into a single final codeword). In this setting, explicit constructions are known for systematic access-optimal Maximum Distance Separable (MDS) convertible codes for all parameters in the merge regime. However, the existing construction for a key subset of these parameters, which makes use of Vandermonde parity matrices, requires a large field size making it unsuitable for practical applications. In this paper, we provide (1) sharper bounds on the minimum field size requirement for such codes, and (2) explicit constructions for low field sizes for several parameter ranges. In doing so, we provide a proof of super-regularity of specially designed classes of Vandermonde matrices that could be of independent interest.
翻译:大多数大规模存储系统采用纠删码来抵御磁盘故障。最新研究表明,根据磁盘故障率的变化调整冗余度可显著节省存储空间。该过程需要代码转换,即使用$[n^{I\mskip-2mu},k^{I\mskip-2mu}]$初始编码的数据需转换为使用$[n^{F\mskip-2mu},k^{F\mskip-2mu}]$最终编码的数据,这是一项资源密集型操作。可转换码是一类能在保持其他理想特性的同时实现高效代码转换的编码。本文聚焦于转换过程中的访问代价(转换过程中访问的编码符号总数),以及一类重要的转换子类——合并模式(将多个初始码字合并为单个最终码字)。在该场景下,针对合并模式的所有参数,已有显式构造的系统性访问最优最大距离可分(MDS)可转换码。然而,现有针对关键参数子集的构造(采用范德蒙德校验矩阵)需要较大的域大小,使其不适用于实际应用。本文提供:(1)此类代码最小域大小需求的更紧界;(2)多个参数范围内低域大小的显式构造。在此过程中,我们给出了特殊设计类范德蒙德矩阵超正则性的证明,该结果可能具有独立研究价值。