面向SOC工作流中安全查询生成的小型语言模型研究 (Towards Small Language Models for Security Query Generation in SOC Workflows)

Analysts in Security Operations Centers routinely query massive telemetry streams using Kusto Query Language (KQL). Writing correct KQL requires specialized expertise, and this dependency creates a bottleneck as security teams scale. This paper investigates whether Small Language Models (SLMs) can enable accurate, cost-effective natural-language-to-KQL translation for enterprise security. We propose a three-knob framework targeting prompting, fine-tuning, and architecture design. First, we adapt existing NL2KQL framework for SLMs with lightweight retrieval and introduce error-aware prompting that addresses common parser failures without increasing token count. Second, we apply LoRA fine-tuning with rationale distillation, augmenting each NLQ-KQL pair with a brief chain-of-thought explanation to transfer reasoning from a teacher model while keeping the SLM compact. Third, we propose a two-stage architecture that uses an SLM for candidate generation and a low-cost LLM judge for schema-aware refinement and selection. We evaluate nine models (five SLMs and four LLMs) across syntax correctness, semantic accuracy, table selection, and filter precision, alongside latency and token cost. On Microsoft's NL2KQL Defender Evaluation dataset, our two-stage approach achieves 0.987 syntax and 0.906 semantic accuracy. We further demonstrate generalizability on Microsoft Sentinel data, reaching 0.964 syntax and 0.831 semantic accuracy. These results come at up to 10x lower token cost than GPT-5, establishing SLMs as a practical, scalable foundation for natural-language querying in security operations.

翻译：安全运营中心的分析师通常使用Kusto查询语言对海量遥测数据流进行查询。编写正确的KQL需要专业知识，这种依赖性在安全团队规模扩大时会造成瓶颈。本文研究小型语言模型是否能为企业安全领域实现准确、经济高效的自然语言到KQL的翻译。我们提出了一个针对提示工程、微调和架构设计的三维调控框架。首先，我们通过轻量级检索技术将现有NL2KQL框架适配于SLM，并引入错误感知提示方法，在不增加令牌数量的情况下解决常见解析失败问题。其次，我们采用基于原理提炼的LoRA微调策略，为每个自然语言问题-KQL对添加简短的思维链解释，从而在保持SLM紧凑性的同时实现教师模型推理能力的迁移。第三，我们提出一种两阶段架构：使用SLM进行候选生成，再通过低成本LLM评判器进行模式感知的优化与选择。我们在语法正确性、语义准确性、表选择精度和过滤条件精度等维度评估了九种模型（五种SLM和四种LLM），同时考量延迟和令牌成本。在微软NL2KQL Defender评估数据集上，我们的两阶段方法实现了0.987的语法准确率和0.906的语义准确率。我们进一步在Microsoft Sentinel数据上验证了泛化能力，达到0.964的语法准确率和0.831的语义准确率。这些成果的令牌成本比GPT-5降低达10倍，证实了SLM可作为安全运营中自然语言查询实践化、可扩展的基础架构。