Fault injectors are essential tools for evaluating the reliability and resilience of computing systems. They enable the simulation of hardware and software faults to analyze system behavior under error conditions and assess its ability to operate correctly despite disruptions. Such analysis is critical for identifying vulnerabilities and improving system robustness. CHAOS is a modular, open-source, and fully configurable fault injection framework designed for the gem5 simulator. It facilitates precise and systematic fault injection across multiple architectural levels, supporting comprehensive evaluations of fault tolerance mechanisms and resilience strategies. Its high configurability and seamless integration with gem5 allow researchers to explore a wide range of fault models and complex scenarios, making CHAOS a valuable tool for advancing research in dependable and high-performance computing systems.
翻译:故障注入器是评估计算系统可靠性与弹性的关键工具。它们能够模拟硬件与软件故障,以分析系统在错误条件下的行为,并评估其在干扰下保持正确运行的能力。此类分析对于识别系统脆弱性、提升系统鲁棒性至关重要。CHAOS是一个为gem5模拟器设计的模块化、开源且完全可配置的故障注入框架。它支持在多个架构层级进行精确、系统化的故障注入,有助于对容错机制与弹性策略进行全面评估。其高度可配置性以及与gem5的无缝集成,使得研究人员能够探索广泛的故障模型与复杂场景,这使CHAOS成为推动可靠高性能计算系统研究的重要工具。