Indicators of Compromise (IoCs) are critical for threat detection and response, marking malicious activity across networks and systems. Yet, the effectiveness of automated IoC extraction systems is fundamentally limited by one key issue: the lack of high-quality ground truth. Current extraction tools rely either on manually extracted ground truth, which is labor-intensive and costly, or on automated ground truth creation methods that include non-malicious artifacts, leading to inflated false positive (FP) rates and unreliable threat intelligence. In this work, we analyze the shortcomings of existing ground truth creation strategies and address them by introducing the first hybrid human-in-the-loop pipeline for IoC extraction, which combines a large language model-based classifier (LANCE) with expert analyst validation. Our system improves precision through explainable, context-aware labeling and reduces analysts' work factor by 43% compared to manual annotation, as demonstrated in our evaluation with six analysts. Using this approach, we produce PRISM, a high-quality, publicly available benchmark of 1,791 labeled IoCs from 50 real-world threat reports. PRISM supports both fair evaluation and training of IoC extraction methods and enables reproducible research grounded in expert-validated indicators.
翻译:入侵指标(Indicators of Compromise, IoCs)对于威胁检测与响应至关重要,它们标记着网络和系统中的恶意活动。然而,自动化IoC提取系统的有效性从根本上受到一个关键问题的限制:缺乏高质量的基准真值。当前的提取工具要么依赖于人工提取的基准真值(这种方法劳动密集且成本高昂),要么依赖于包含非恶意工件的自动化基准真值创建方法,这导致误报率虚高并产生不可靠的威胁情报。在本研究中,我们分析了现有基准真值创建策略的不足,并通过引入首个用于IoC提取的混合人机协同流程来解决这些问题。该流程结合了基于大语言模型的分类器(LANCE)与专家分析师验证。我们的系统通过可解释、上下文感知的标注提高了精确度,并且与人工标注相比,将分析师的工作量减少了43%(这一点在我们与六位分析师的评估中得到了验证)。利用这种方法,我们构建了PRISM——一个高质量、公开可用的基准数据集,包含来自50份真实世界威胁报告的1,791个已标注IoC。PRISM既支持对IoC提取方法进行公平评估与训练,也使得基于专家验证指标的可复现研究成为可能。