To adapt to a constantly evolving landscape of cyber threats, organizations actively need to collect Indicators of Compromise (IOCs), i.e., forensic artifacts that signal that a host or network might have been compromised. IOCs can be collected through open-source and commercial structured IOC feeds. But, they can also be extracted from a myriad of unstructured threat reports written in natural language and distributed using a wide array of sources such as blogs and social media. There exist multiple indicator extraction tools that can identify IOCs in natural language reports. But, it is hard to compare their accuracy due to the difficulty of building large ground truth datasets. This work presents a novel majority vote methodology for comparing the accuracy of indicator extraction tools, which does not require a manually-built ground truth. We implement our methodology into GoodFATR, an automated platform for collecting threat reports from a wealth of sources, extracting IOCs from the collected reports using multiple tools, and comparing their accuracy. GoodFATR supports 6 threat report sources: RSS, Twitter, Telegram, Malpedia, APTnotes, and ChainSmith. GoodFATR continuously monitors the sources, downloads new threat reports, extracts 41 indicator types from the collected reports, and filters non-malicious indicators to output the IOCs. We run GoodFATR over 15 months to collect 472,891 reports from the 6 sources; extract 978,151 indicators from the reports; and identify 618,217 IOCs. We analyze the collected data to identify the top IOC contributors and the IOC class distribution. We apply GoodFATR to compare the IOC extraction accuracy of 7 popular open-source tools with GoodFATR's own indicator extraction module.
翻译:为适应不断变化的网络威胁环境,组织需要主动收集入侵指标(IOCs),即表明主机或网络可能已被攻陷的取证性痕迹。IOCs可通过开源或商业化的结构化IOC数据源收集,但也能从大量以自然语言编写的非结构化威胁报告中提取,这些报告通过博客和社交媒体等多种渠道发布。目前存在多种能够从自然语言报告中识别IOC的指标提取工具,但由于构建大规模基准真实数据集存在困难,比较这些工具的准确性颇具挑战。本文提出一种基于多数投票的新型方法,用于比较指标提取工具的准确性,该方法无需人工构建基准真实数据。我们将该方法实现为GoodFATR——一个自动化的威胁报告收集平台,可从多种来源收集报告,利用多种工具提取IOC,并比较其准确性。GoodFATR支持6种威胁报告来源:RSS、Twitter、Telegram、Malpedia、APTnotes和ChainSmith。该平台持续监控这些来源,下载新的威胁报告,从收集到的报告中提取41种指标类型,并过滤非恶意指标以输出IOC。我们运行GoodFATR超过15个月,从6个来源收集了472,891份报告;从中提取了978,151个指标;并识别出618,217个IOC。通过分析收集的数据,我们确定了主要的IOC贡献者及IOC类别分布。我们应用GoodFATR比较了7种主流开源工具及其自身指标提取模块的IOC提取准确性。