The Rise of GoodFATR: A Novel Accuracy Comparison Methodology for Indicator Extraction Tools

To adapt to a constantly evolving landscape of cyber threats, organizations actively need to collect Indicators of Compromise (IOCs), i.e., forensic artifacts that signal that a host or network might have been compromised. IOCs can be collected through open-source and commercial structured IOC feeds. But, they can also be extracted from a myriad of unstructured threat reports written in natural language and distributed using a wide array of sources such as blogs and social media. There exist multiple indicator extraction tools that can identify IOCs in natural language reports. But, it is hard to compare their accuracy due to the difficulty of building large ground truth datasets. This work presents a novel majority vote methodology for comparing the accuracy of indicator extraction tools, which does not require a manually-built ground truth. We implement our methodology into GoodFATR, an automated platform for collecting threat reports from a wealth of sources, extracting IOCs from the collected reports using multiple tools, and comparing their accuracy. GoodFATR supports 6 threat report sources: RSS, Twitter, Telegram, Malpedia, APTnotes, and ChainSmith. GoodFATR continuously monitors the sources, downloads new threat reports, extracts 41 indicator types from the collected reports, and filters non-malicious indicators to output the IOCs. We run GoodFATR over 15 months to collect 472,891 reports from the 6 sources; extract 978,151 indicators from the reports; and identify 618,217 IOCs. We analyze the collected data to identify the top IOC contributors and the IOC class distribution. We apply GoodFATR to compare the IOC extraction accuracy of 7 popular open-source tools with GoodFATR's own indicator extraction module.

翻译：为适应不断变化的网络威胁环境，组织需要主动收集入侵指标（IOCs），即表明主机或网络可能已被攻陷的取证性痕迹。IOCs可通过开源或商业化的结构化IOC数据源收集，但也能从大量以自然语言编写的非结构化威胁报告中提取，这些报告通过博客和社交媒体等多种渠道发布。目前存在多种能够从自然语言报告中识别IOC的指标提取工具，但由于构建大规模基准真实数据集存在困难，比较这些工具的准确性颇具挑战。本文提出一种基于多数投票的新型方法，用于比较指标提取工具的准确性，该方法无需人工构建基准真实数据。我们将该方法实现为GoodFATR——一个自动化的威胁报告收集平台，可从多种来源收集报告，利用多种工具提取IOC，并比较其准确性。GoodFATR支持6种威胁报告来源：RSS、Twitter、Telegram、Malpedia、APTnotes和ChainSmith。该平台持续监控这些来源，下载新的威胁报告，从收集到的报告中提取41种指标类型，并过滤非恶意指标以输出IOC。我们运行GoodFATR超过15个月，从6个来源收集了472,891份报告；从中提取了978,151个指标；并识别出618,217个IOC。通过分析收集的数据，我们确定了主要的IOC贡献者及IOC类别分布。我们应用GoodFATR比较了7种主流开源工具及其自身指标提取模块的IOC提取准确性。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日