The success of large language models (LLMs) depends heavily on large-scale, high-quality instruction-following and reinforcement datasets. However, generating such data through human annotation is prohibitively time-consuming particularly for domain-specific tasks like telecom network troubleshooting, where accurate responses require deep technical expertise and contextual understanding. In this paper, we present a fully automated, retrieval-augmented pipeline for generating synthetic question-answer (QA) pairs grounded in structured domain knowledge. Our multi-stage framework integrates a retriever, base generator, and refinement model to synthesize and enhance QA pairs using documents retrieved from a domain-specific knowledge graph. To ensure data quality, we employ customized RAGAS-based scoring to filter low-quality samples, producing a high-quality dataset suitable for reinforcement fine-tuning (RFT). We demonstrate our approach in a real-world telecom scenario focused on radio access network (RAN) troubleshooting. The resulting pipeline generates complex, context-rich troubleshooting solution plans without human intervention. This work offers a scalable solution for building instruction and reinforcement datasets in specialized domains, significantly reducing dependence on manual labeling while maintaining high technical fidelity.
翻译:大型语言模型(LLM)的成功在很大程度上依赖于大规模、高质量的指令遵循与强化数据集。然而,通过人工标注生成此类数据耗时极长,对于电信网络故障排除等特定领域任务尤为如此,因为准确的回答需要深厚的专业知识和上下文理解能力。本文提出了一种完全自动化、检索增强的合成问答对生成流程,该流程基于结构化的领域知识。我们的多阶段框架集成了检索器、基础生成器和精炼模型,利用从领域特定知识图谱检索的文档来合成并增强问答对。为确保数据质量,我们采用基于RAGAS的定制化评分机制来过滤低质量样本,从而生成适用于强化微调(RFT)的高质量数据集。我们在一个专注于无线接入网(RAN)故障排除的真实电信场景中验证了所提方法。该流程能够在无需人工干预的情况下,生成复杂且上下文丰富的故障排除解决方案计划。这项工作为在专业领域构建指令与强化数据集提供了一种可扩展的解决方案,在保持高技术保真度的同时,显著降低了对人工标注的依赖。