Unix Tools and the FITO Category Mistake: Crash Consistency and the Protocol Nature of Persistence

Unix tools such as ls, cp, mv, and rename expose a filesystem abstraction that appears to present a single, authoritative state evolving through atomic transitions. This abstraction is false. We present a systematic Forward-In-Time-Only (FITO) analysis demonstrating that the assumption of instantaneous atomic state transitions constitutes a category mistake at every layer of the computing stack -- from ext4 journaling and delayed allocation, through fsync failure semantics, NVMe Flush/FUA device behavior, and Linux restartable sequences, down to the x86-64 CPU's own inability to guarantee atomic supervisor entry under Non-Maskable Interrupts. We prove a formal impossibility result: no syscall-based persistence primitive can define a commit boundary under failure, because the syscall return value is consistent with multiple materially different persistence states across Linux filesystems. We identify cross-layer temporal assumption leakage as the structural mechanism by which the category mistake propagates, and show that the entire storage stack forms a recursive chain of non-atomic dependencies whose apparent atomicity reflects mathematical impossibility (Herlihy, 1991), not merely engineering deficiency. An appendix documents the real-world consequences: cascading cloud outages at Google, AWS, Meta, and Cloudflare driven by retry amplification; database corruption from fsync failures in PostgreSQL, etcd, and MySQL; silent data corruption at CERN, NetApp, and Meta; AI training waste consuming 12--43% of compute budgets at scale; and financial system failures totaling billions of dollars annually. These consequences trace to a single structural cause: systems designed around the FITO assumption, compensating for its failure with retry-and-recover protocols that amplify the very failures they attempt to mask.

翻译：ls、cp、mv、rename等Unix工具所呈现的文件系统抽象，似乎展示了一个通过原子性变迁演进的单一权威状态。这种抽象是错误的。我们提出一种系统性的"仅向前时间"（FITO）分析，证明瞬时原子状态变迁的假设在计算栈的每一层都存在范畴错误——从ext4日志记录与延迟分配，到fsync故障语义、NVMe Flush/FUA设备行为、Linux可重启序列，直至x86-64 CPU自身在不可屏蔽中断下无法保证原子性的监管程序入口。我们证明了一个形式化的不可能性结果：任何基于系统调用的持久化原语都无法在故障下定义提交边界，因为系统调用的返回值在Linux文件系统中与多个实质不同的持久化状态保持一致。我们识别出跨层时间假设泄漏是该范畴错误传播的结构机制，并证明整个存储栈形成了递归的非原子依赖链，其表面上的原子性反映的是数学上的不可能性（Herlihy, 1991），而不仅仅是工程缺陷。附录记录了现实世界的后果：由重试放大引发的Google、AWS、Meta和Cloudflare级联云故障；PostgreSQL、etcd和MySQL因fsync故障导致的数据库损坏；CERN、NetApp和Meta的静默数据损坏；大规模AI训练浪费消耗12-43%的计算预算；以及每年造成数十亿美元损失的金融系统故障。这些后果可追溯至单一结构成因：围绕FITO假设设计的系统，通过重试恢复协议来补偿其失效，却放大了它们试图掩盖的故障。