Storage systems are fundamental to modern computing infrastructures, yet ensuring their correctness remains challenging in practice. Despite decades of research on system testing, many storage-system failures (including durability, ordering, recovery, and consistency violations) remain difficult to expose systematically. This difficulty stems not primarily from insufficient testing tooling, but from intrinsic properties of storage-system execution, including nondeterministic interleavings, long-horizon state evolution, and correctness semantics that span multiple layers and execution phases. This survey adopts a storage-centric view of system testing and organizes existing techniques according to the execution properties and failure mechanisms they target. We review a broad spectrum of approaches, ranging from concurrency testing and long-running workloads to crash-consistency analysis, hardware-level semantic validation, and distributed fault injection, and analyze their fundamental strengths and limitations. Within this framework, we examine fuzzing as an automated testing paradigm, highlighting systematic mismatches between conventional fuzzing assumptions and storage-system semantics, and discuss how recent artificial intelligence advances may complement fuzzing through state-aware and semantic guidance. Overall, this survey provides a unified perspective on storage-system correctness testing and outlines key challenges
翻译:存储系统是现代计算基础设施的基础,然而在实践中确保其正确性仍然具有挑战性。尽管系统测试研究已开展数十年,许多存储系统故障(包括持久性、顺序性、恢复和一致性违反)仍难以被系统性地暴露。这一困难主要并非源于测试工具不足,而是根植于存储系统执行的内在特性,包括非确定性交错、长时程状态演化以及跨越多个层级和执行阶段的正确性语义。本综述采用以存储为中心的系统测试视角,依据现有技术所针对的执行特性与故障机制对其进行系统梳理。我们回顾了从并发测试与长时运行负载,到崩溃一致性分析、硬件级语义验证及分布式故障注入的广泛方法谱系,并分析了其根本优势与局限。在此框架下,我们深入审视模糊测试这一自动化测试范式,揭示传统模糊测试假设与存储系统语义之间的系统性失配,并探讨近期人工智能进展如何通过状态感知与语义引导对模糊测试形成补充。总体而言,本综述为存储系统正确性测试提供了统一视角,并指明了关键挑战