We consider the problem of geographically distributed data storage in a network of servers (or nodes) where the nodes are connected to each other via communication links having certain round-trip times (RTTs). Each node serves a specific set of clients, where a client can request for any of the files available in the distributed system. The parent node provides the requested file if available locally; else it contacts other nodes that have the data needed to retrieve the requested file. This inter-node communication incurs a delay resulting in a certain latency in servicing the data request. The worst-case latency incurred at a servicing node and the system average latency are important performance metrics of a storage system, which depend not only on inter-node RTTs, but also on how the data is stored across the nodes. Data files could be placed in the nodes as they are, i.e., in uncoded fashion, or can be coded and placed. This paper provides the necessary and sufficient conditions for the existence of uncoded storage schemes that are optimal in terms of both per-node worst-case latency and system average latency. In addition, the paper provides efficient binary storage codes for a specific case where optimal uncoded schemes do not exist.
翻译:本文研究由服务器(或节点)组成的网络中的地理分布式数据存储问题,节点间通过具有特定往返时延(RTT)的通信链路相互连接。每个节点服务一组特定的客户端,客户端可请求分布式系统中可用的任意文件。若父节点本地存有请求文件则直接提供;否则需联系存有所需数据的其他节点以获取文件。这种节点间通信会产生延迟,导致服务数据请求时出现特定时延。服务节点的最坏情况延迟与系统平均延迟是存储系统的重要性能指标,其不仅取决于节点间RTT,还取决于数据在节点间的存储方式。数据文件可以未编码形式直接存储于节点,也可经编码后存储。本文给出了在节点最坏情况延迟与系统平均延迟两方面均达到最优的无编码存储方案存在的充要条件。此外,针对最优无编码方案不存在的特定情形,本文提供了高效的二进制存储编码方案。