Sharpen the Spec, Cut the Code: A Case for Generative File System with SYSSPEC

File systems are critical OS components that require constant evolution to support new hardware and emerging application needs. However, the traditional paradigm of developing features, fixing bugs, and maintaining the system incurs significant overhead, especially as systems grow in complexity. This paper proposes a new paradigm, generative file systems, which leverages Large Language Models (LLMs) to generate and evolve a file system from prompts, effectively addressing the need for robust evolution. Despite the widespread success of LLMs in code generation, attempts to create a functional file system have thus far been unsuccessful, mainly due to the ambiguity of natural language prompts. This paper introduces SYSSPEC, a framework for developing generative file systems. Its key insight is to replace ambiguous natural language with principles adapted from formal methods. Instead of imprecise prompts, SYSSPEC employs a multi-part specification that accurately describes a file system's functionality, modularity, and concurrency. The specification acts as an unambiguous blueprint, guiding LLMs to generate expected code flexibly. To manage evolution, we develop a DAG-structured patch that operates on the specification itself, enabling new features to be added without violating existing invariants. Moreover, the SYSSPEC toolchain features a set of LLM-based agents with mechanisms to mitigate hallucination during construction and evolution. We demonstrate our approach by generating SPECFS, a concurrent file system. SPECFS demonstrates equivalent level of correctness to that of a manually-coded baseline across hundreds of regression tests. We further confirm its evolvability by seamlessly integrating 10 real-world features from Ext4. Our work shows that a specification-guided approach makes generating and evolving complex systems not only feasible but also highly effective.

翻译：文件系统是操作系统中的关键组件，需要持续演进以支持新硬件和新兴应用需求。然而，传统模式下开发功能、修复漏洞和维护系统会产生显著开销，尤其在系统复杂性日益增长时更为突出。本文提出一种新范式——生成式文件系统，其利用大语言模型（LLMs）通过提示词生成并演进文件系统，从而有效应对稳健演进的需求。尽管LLMs在代码生成领域已取得广泛成功，但迄今为止构建功能性文件系统的尝试均未成功，主要源于自然语言提示的模糊性。本文介绍SYSSPEC——一个用于开发生成式文件系统的框架。其核心洞见在于采用形式化方法衍生的原则替代模糊的自然语言。SYSSPEC采用多部分组成的规范来精确描述文件系统的功能、模块化与并发性，以此取代不精确的提示词。该规范作为明确的蓝图，灵活引导LLMs生成预期代码。为管理演进过程，我们开发了一种作用于规范本身的DAG结构补丁，使得新功能可在不破坏现有不变量的前提下添加。此外，SYSSPEC工具链配备了一组基于LLM的智能体，其内置机制可在系统构建与演进过程中缓解幻觉问题。我们通过生成并发文件系统SPECFS来验证所提方法。SPECFS在数百项回归测试中展现出与手动编码基准系统相当的正确性水平。我们通过无缝集成来自Ext4的10项真实世界功能，进一步证实了其可演进性。本研究表明，规范引导的方法使得生成与演进复杂系统不仅可行，而且极具成效。