SwitchFS：基于网内协调的分布式文件系统异步元数据更新 (SwitchFS: Asynchronous Metadata Updates for Distributed Filesystems with In-Network Coordination)

Distributed filesystem metadata updates are typically synchronous. This creates inherent challenges for access efficiency, load balancing, and directory contention, especially under dynamic and skewed workloads. This paper argues that synchronous updates are overly conservative. We propose SwitchFS with asynchronous metadata updates that allow operations to return early and defer directory updates until reads, both hiding latency and amortizing overhead. The key challenge lies in efficiently maintaining the synchronous POSIX semantics of metadata updates. To address this, SwitchFS is co-designed with a programmable switch, leveraging the limited on-switch resources to track directory states with negligible overhead. This allows SwitchFS to aggregate and apply delayed updates efficiently, using batching and consolidation before directory reads. Evaluation shows that SwitchFS achieves up to 13.34$\times$ and 3.85$\times$ higher throughput, and 61.6% and 57.3% lower latency than two state-of-the-art distributed filesystems, Emulated-InfiniFS and Emulated-CFS, respectively, under skewed workloads. For real-world workloads, SwitchFS improves end-to-end throughput by 21.1$\times$, 1.1$\times$, and 0.3$\times$ over CephFS, Emulated-InfiniFS, and Emulated-CFS, respectively.

翻译：分布式文件系统的元数据更新通常是同步的。这给访问效率、负载均衡和目录争用带来了固有挑战，尤其是在动态且倾斜的工作负载下。本文认为同步更新过于保守。我们提出了SwitchFS，它采用异步元数据更新机制，允许操作提前返回并将目录更新延迟到读取时执行，从而既隐藏了延迟又分摊了开销。关键挑战在于如何高效地维持元数据更新的同步POSIX语义。为此，SwitchFS与可编程交换机协同设计，利用交换机上有限的资源以可忽略的开销跟踪目录状态。这使得SwitchFS能够在目录读取前，通过批处理和合并操作，高效地聚合并应用延迟的更新。评估表明，在倾斜工作负载下，与两种先进的分布式文件系统（Emulated-InfiniFS和Emulated-CFS）相比，SwitchFS分别实现了高达13.34倍和3.85倍的吞吐量提升，以及61.6%和57.3%的延迟降低。对于实际工作负载，SwitchFS相较于CephFS、Emulated-InfiniFS和Emulated-CFS，端到端吞吐量分别提升了21.1倍、1.1倍和0.3倍。

相关内容

元数据

关注 7

元数据（Metadata），又称元数据、中介数据、中继数据[来源请求]，为描述数据的数据（data about data），主要是描述数据属性（property）的信息，用来支持如指示存储位置、历史数据、资源查找、文件纪录等功能。元数据算是一种电子式目录，为了达到编制目录的目的，必须在描述并收藏数据的内容或特色，进而达成协助数据检索的目的。

【NeurIPS2023】MultiModN:多模态，多任务，可解释的模块化网络

专知会员服务

40+阅读 · 2023年9月27日

Meta-Transformer：多模态学习的统一框架

专知会员服务

59+阅读 · 2023年7月21日

《子空间学习机 (SLM)：一种新的分类和回归方法》2022最新35页技术报告，美陆军研究实验室

专知会员服务

31+阅读 · 2022年11月28日