Modern storage systems, often deployed to support multiple tenants in the cloud, must provide performance isolation. Unfortunately, traditional approaches such as fair sharing do not provide performance isolation for storage systems, because their resources (e.g., write buffers and read caches) exhibit high preemption delays. These delays lead to unacceptable spikes in client tail latencies, as clients may be forced to wait arbitrarily long to receive their fair share of resources. We introduce Delta Fair Sharing, a family of algorithms for sharing resources with high preemption delays. These algorithms satisfy two key properties: $δ$-fairness, which bounds a client's delay in receiving its fair share of resources to $δ$ time units, and $δ$-Pareto-efficiency, which allocates unused resources to clients with unmet demand. Together, these properties capture resource-acquisition delays end-to-end, bound well-behaved clients' tail-latency spikes to $δ$ time units, and ensure high utilization. We implement such algorithms in FAIRDB, an extension of RocksDB. Our evaluation shows that FAIRDB isolates well-behaved clients from high-demand workloads better than state-of-the-art alternatives.
翻译:现代存储系统通常部署于云端以支持多租户场景,必须提供性能隔离能力。然而,传统方法(如公平共享)无法为存储系统提供有效的性能隔离,因为其资源(例如写缓冲区和读缓存)存在较高的抢占延迟。这些延迟会导致客户端尾部延迟出现不可接受的尖峰,因为客户端可能被迫等待任意长时间才能获得其公平份额的资源。本文提出Delta Fair Sharing算法族,用于共享具有高抢占延迟的资源。该算法族满足两个关键特性:$δ$-公平性(将客户端获得其公平资源份额的延迟上界控制在$δ$时间单位内)与$δ$-帕累托效率(将未使用资源分配给需求未满足的客户端)。这些特性共同实现了端到端的资源获取延迟管控,将合规客户端的尾部延迟尖峰限制在$δ$时间单位内,同时确保高资源利用率。我们在RocksDB的扩展系统FAIRDB中实现了此类算法。实验评估表明,相较于现有最优方案,FAIRDB能更有效地将合规客户端与高负载工作流进行隔离。