Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the heterogeneous networks (HNs) environment. Moreover, current state-of-the-art distributed fault diagnosis methods, which utilize specific machine learning techniques, lack multi-scale adaptivity for heterogeneous device information, resulting in unsatisfactory diagnostic accuracy for HNs. In this paper, we develop an LLM-assisted end-to-end intelligent network health management framework. The framework first proposes a multi-scale data scaling method based on unsupervised learning to address the multi-scale data problem in HNs. Secondly, we combine the semantic rule tree with the attention mechanism to propose a Multi-Scale Semanticized Anomaly Detection Model (MSADM) that generates network semantic information while detecting anomalies. Finally, we embed a chain-of-thought-based large-scale language model downstream to adaptively analyze the fault diagnosis results and create an analysis report containing detailed fault information and optimization strategies. We compare our scheme with other fault diagnosis models and demonstrate that it performs well on several metrics of network fault diagnosis.
翻译:网络设备与系统健康管理是现代网络运维的基础。传统依赖专家识别或简单规则算法的健康管理方法难以应对异构网络环境。此外,当前采用特定机器学习技术的先进分布式故障诊断方法缺乏对异构设备信息的多尺度适应性,导致异构网络诊断准确率不足。本文提出了一种LLM辅助的端到端智能网络健康管理框架。该框架首先提出基于无监督学习的多尺度数据缩放方法,以解决异构网络中的多尺度数据问题;其次,将语义规则树与注意力机制相结合,提出多尺度语义化异常检测模型(MSADM),在检测异常的同时生成网络语义信息;最后,嵌入基于思维链的大规模语言模型作为下游模块,自适应分析故障诊断结果,并生成包含详细故障信息与优化策略的分析报告。我们将所提方案与其他故障诊断模型进行对比,证明其在多项网络故障诊断指标上表现优异。