Modern database management systems (DBMS) face significant challenges in maintaining performance and availability under dynamic workloads. This paper proposes a novel self-healing framework that integrates Model-Agnostic Meta-Learning (MAML) for few-shot anomaly detection, Graph Neural Networks (GNNs) for dependency-driven cascading failure prediction, and multi-objective Reinforcement Learning (RL) for autonomous recovery. Unlike existing database tuning systems that focus primarily on offline configuration optimization, our framework enables real-time, end-to-end self-healing by rapidly adapting to unseen workload patterns with minimal labeled data. We introduce dynamic GNN-based dependency modeling that captures workload-dependent relationships between database components, enabling proactive cascade prevention. A scalarized multi-objective RL formulation balances latency, resource utilization, and cost during recovery, while SHAP-based explainability ensures operational transparency. Evaluations on Google Cluster Data and TPC benchmarks demonstrate 90.5\% anomaly detection F1-score with 5-shot adaptation, 90.1\% cascade prediction accuracy, and 85.1\% latency reduction in recovery actions, outperforming strong baselines including Isolation Forest, LSTM autoencoders, static GCN, and standard RL methods.
翻译:暂无翻译