In a distributed storage system serving hot data, the data recovery performance becomes important, captured e.g. by the service rate. We give partial evidence for it being hardest to serve a sequence of equal user requests (as in PIR coding regime) both for concrete and random user requests and server contents. We prove that a constant request sequence is locally hardest to serve: If enough copies of each vector are stored in servers, then if a request sequence with all requests equal can be served then we can still serve it if a few requests are changed. For random iid server contents, with number of data symbols constant (for simplicity) and the number of servers growing, we show that the maximum number of user requests we can serve divided by the number of servers we need approaches a limit almost surely. For uniform server contents, we show this limit is 1/2, both for sequences of copies of a fixed request and of any requests, so it is at least as hard to serve equal requests as any requests. For iid requests independent from the uniform server contents the limit is at least 1/2 and equal to 1/2 if requests are all equal to a fixed request almost surely, confirming the same. As a building block, we deduce from a 1952 result of Marshall Hall, Jr. on abelian groups, that any collection of half as many requests as coded symbols in the doubled binary simplex code can be served by this code. This implies the fractional version of the Functional Batch Code Conjecture that allows half-servers.
翻译:在服务于热数据的分布式存储系统中,数据恢复性能至关重要,通常通过服务速率等指标衡量。我们提供了局部分布证据表明,无论对于具体或随机的用户请求及服务器内容,服务等量用户请求序列(如PIR编码场景)是最困难的。我们证明恒定请求序列在局部意义上最难服务:若每个向量的足够多副本存储在服务器中,那么当所有请求均相同的请求序列可被服务时,改变少量请求后仍能维持服务。对于独立同分布的随机服务器内容(为简化分析,假设数据符号数为常数且服务器数量增长),我们证明了可服务的最大用户请求数与所需服务器数量之比几乎必然收敛于某个极限。针对均匀服务器内容,该极限值为1/2,无论请求序列是固定请求的副本还是任意请求序列,因此服务等量请求至少与任意请求同样困难。对于独立于均匀服务器内容的独立同分布请求,该极限至少为1/2,若请求几乎必然全部等于某个固定请求则极限恰好为1/2,这一结论与前述一致。作为基础工具,我们基于Marshall Hall Jr.于1952年提出的阿贝尔群结论推导出:加倍二进制单纯码中,任何规模不超过编码符号数一半的请求集合均可被该码服务。这暗示了允许半服务器情形下函数批量码猜想的分数版本成立。