Ensuring the reliability and resilience of modern web applications remains a critical challenge due to increasing system complexity and dynamic runtime environments. This study proposes a modular self-healing framework based on the monitor-analyze-plan-execute over a shared knowledge base (MAPE-K) model, integrated with an AutoFix-inspired mechanism for adaptive fault recovery. Using a design and development research (DDR) approach, the system was implemented and evaluated through controlled fault injection experiments across twenty runtime failure scenarios, including service crashes, memory leaks, and database disconnections. Experimental results demonstrate that the proposed framework achieved a mean fault detection F1-score of 90.7% and a recovery success rate of 93.2%. The AutoFix module reduced the average time-to-recovery (TTR) by 56.2%, achieving an average recovery time of 3.92 seconds. System throughput was maintained between 88% and 95% during fault conditions, with only a 3.1% increase in response time. Additionally, iterative feedback mechanisms improved recovery efficiency by 18.6% over multiple cycles. These findings indicate that the proposed framework provides a practical and extensible approach to enhancing fault tolerance in web applications through feedback-driven adaptation. While the current implementation relies on predefined recovery strategies, the integration of learning-oriented feedback establishes a foundation for future development of more autonomous self-healing systems.
翻译:确保现代Web应用的可靠性与韧性仍是一项关键挑战,其原因在于系统复杂性的日益增加及运行时环境的动态变化。本研究提出了一种模块化自愈框架,该框架基于共享知识库的监控-分析-规划-执行(MAPE-K)模型,并融合了一种受AutoFix启发的自适应故障恢复机制。采用设计研究(DDR)方法,通过针对二十种运行时故障场景(包括服务崩溃、内存泄漏和数据库断连)进行的受控故障注入实验,对该系统进行了实现与评估。实验结果表明,提出的框架实现了平均90.7%的故障检测F1分数和93.2%的恢复成功率。AutoFix模块将平均恢复时间(TTR)减少了56.2%,实现了3.92秒的平均恢复时间。在故障条件下,系统吞吐量维持在88%至95%之间,响应时间仅增加3.1%。此外,迭代反馈机制经过多个周期将恢复效率提升了18.6%。这些发现表明,所提出的框架通过反馈驱动的自适应,为增强Web应用的容错能力提供了一种实用且可扩展的方法。尽管当前实现依赖于预定义的恢复策略,但学习导向型反馈的整合为未来开发更自主的自愈系统奠定了基础。