Diagnosing Quality-of-Experience (QoE) degradations in operational Radio Access Networks (RANs) is a critical but notoriously complex task, traditionally requiring labor-intensive expert analysis over high-dimensional, cross-layer telemetry. While Large Language Models (LLMs) offer unprecedented reasoning capabilities, they are fundamentally unsuited for raw RANs troubleshooting: they fail at numeric time-series analysis, hallucinate protocol-violating causal links, and lack the stateful rigor required for multi-step fault localization. To bridge this gap, we present QoEReasoner, an end-to-end, LLM-driven agentic system designed for automated and explainable QoE diagnosis. QoEReasoner tames the inherent unpredictability of LLMs by grounding their reasoning in the physical realities of the network. It employs deterministic tools to reliably translate raw numeric KPIs into structured evidence, enforces protocol-consistent fault propagation through a domain-specific Knowledge Base, and leverages a Historical Bank of expert-validated cases to guide hypothesis generation. A stateful central planner orchestrates this closed-loop process across anomaly detection, causal tracing, and root-cause localization. Evaluations on real-world operational RANs datasets demonstrate that QoEReasoner outperforms strong baselines by 18\%-40\% in accuracy across multiple diagnostic tasks. Furthermore, it reduces diagnostic time from approximately 30 minutes of manual expert analysis to just 3 minutes per session, delivering highly interpretable, expert-grade reports while remaining robust across diverse LLM backbones.
翻译:摘要:诊断运营级无线接入网(RAN)中的体验质量(QoE)退化是一项关键但众所周知的复杂任务,传统上需要专家对高维跨层遥测数据进行耗时的人工分析。虽然大语言模型(LLM)展现出前所未有的推理能力,但它们本质上不适合原始RAN故障排查:它们无法处理数值时间序列分析,会虚构违反协议规范的因果联系,并且缺乏多步骤故障定位所需的状态严格性。为弥合这一差距,我们提出QoEReasoner——一个端到端的、由LLM驱动的智能体系统,专为自动化且可解释的QoE诊断而设计。QoEReasoner通过将LLM的推理锚定在网络的物理现实中来抑制其固有的不可预测性。它采用确定性工具将原始数值KPI可靠地转化为结构化证据,通过领域特定知识库强制执行符合协议规范的故障传播,并利用专家验证过的历史案例库指导假设生成。一个有状态的中心规划器在异常检测、因果追踪和根因定位之间协调这一闭环过程。在真实运营级RAN数据集上的评估表明,QoEReasoner在多个诊断任务上的准确率比强基线高出18%-40%。此外,它将每次诊断时间从约30分钟的人工专家分析缩短至仅3分钟,提供高度可解释的专家级报告,同时在不同LLM骨干模型上保持鲁棒性。