Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the heterogeneous networks (HNs) environment. Moreover, current state-of-the-art distributed fault diagnosis methods, which utilize specific machine learning techniques, lack multi-scale adaptivity for heterogeneous device information, resulting in unsatisfactory diagnostic accuracy for HNs. In this paper, we develop an LLM-assisted end-to-end intelligent network health management framework. The framework first proposes a multi-scale data scaling method based on unsupervised learning to address the multi-scale data problem in HNs. Secondly, we combine the semantic rule tree with the attention mechanism to propose a Multi-Scale Semanticized Anomaly Detection Model (MSADM) that generates network semantic information while detecting anomalies. Finally, we embed a chain-of-thought-based large-scale language model downstream to adaptively analyze the fault diagnosis results and create an analysis report containing detailed fault information and optimization strategies. We compare our scheme with other fault diagnosis models and demonstrate that it performs well on several metrics of network fault diagnosis.
翻译:网络设备与系统的健康管理是现代网络运维的基础。传统的健康管理方法依赖专家识别或简单的基于规则的算法,难以应对异构网络环境。此外,当前最先进的分布式故障诊断方法虽然利用了特定的机器学习技术,但缺乏对异构设备信息的多尺度适应性,导致其在异构网络中的诊断准确性不尽如人意。本文提出了一种大语言模型辅助的端到端智能网络健康管理框架。该框架首先提出一种基于无监督学习的多尺度数据缩放方法,以解决异构网络中的多尺度数据问题。其次,我们将语义规则树与注意力机制相结合,提出了一种多尺度语义化异常检测模型,该模型在检测异常的同时生成网络语义信息。最后,我们在下游嵌入了一个基于思维链的大规模语言模型,以自适应地分析故障诊断结果,并生成包含详细故障信息和优化策略的分析报告。我们将本方案与其他故障诊断模型进行了比较,结果表明其在多项网络故障诊断指标上均表现优异。