In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents. To organize this knowledge from experts uniformly, we propose to create a Tele-KG (tele-knowledge graph). Using this valuable data, we further propose a tele-domain language pre-training model TeleBERT and its knowledge-enhanced version, a tele-knowledge re-training model KTeleBERT. which includes effective prompt hints, adaptive numerical data encoding, and two knowledge injection paradigms. Concretely, our proposal includes two stages: first, pre-training TeleBERT on 20 million tele-related corpora, and then re-training it on 1 million causal and machine-related corpora to obtain KTeleBERT. Our evaluation on multiple tasks related to fault analysis in tele-applications, including root-cause analysis, event association prediction, and fault chain tracing, shows that pre-training a language model with tele-domain data is beneficial for downstream tasks. Moreover, the KTeleBERT re-training further improves the performance of task models, highlighting the effectiveness of incorporating diverse tele-knowledge into the model.
翻译:本文分享了我们在故障分析任务中进行电信知识预训练的经验。故障分析是通信应用中的关键任务,通常需要融合机器日志数据与产品文档中的广泛知识。为统一整合领域专家知识,我们提出构建电信知识图谱(Tele-KG)。基于该珍贵数据,我们进一步提出电信领域语言预训练模型TeleBERT及其知识增强版本——电信知识重训练模型KTeleBERT,其中包含有效提示机制、自适应数值数据编码以及两种知识注入范式。具体而言,我们的方案包含两个阶段:首先在2000万电信语料上预训练TeleBERT,随后在100万因果及机器相关语料上重训练得到KTeleBERT。我们在故障分析相关任务(包括根因分析、事件关联预测和故障链追踪)上的评估表明,使用电信领域数据预训练语言模型对下游任务有益。此外,KTeleBERT的重训练进一步提升了任务模型性能,凸显了将多样化的电信知识融入模型的有效性。