This paper proposes a machine learning-based approach for detecting the exploitation of vulnerabilities in the wild by monitoring underground hacking forums. The increasing volume of posts discussing exploitation in the wild calls for an automatic approach to process threads and posts that will eventually trigger alarms depending on their content. To illustrate the proposed system, we use the CrimeBB dataset, which contains data scraped from multiple underground forums, and develop a supervised machine learning model that can filter threads citing CVEs and label them as Proof-of-Concept, Weaponization, or Exploitation. Leveraging random forests, we indicate that accuracy, precision and recall above 0.99 are attainable for the classification task. Additionally, we provide insights into the difference in nature between weaponization and exploitation, e.g., interpreting the output of a decision tree, and analyze the profits and other aspects related to the hacking communities. Overall, our work sheds insight into the exploitation of vulnerabilities in the wild and can be used to provide additional ground truth to models such as EPSS and Expected Exploitability.
翻译:本文提出一种基于机器学习的方法,通过监控地下黑客论坛来检测漏洞在野利用行为。鉴于讨论在野利用的帖子数量日益增长,需要一种自动化方法来处理帖子和话题,并最终根据其内容触发警报。为展示所提系统,我们使用从多个地下论坛采集的CrimeBB数据集,开发了一个监督式机器学习模型,该模型能够过滤提及CVE的帖子,并将其标记为概念验证、武器化或实际利用。通过利用随机森林算法,我们证明该分类任务的准确率、精确率和召回率均可达到0.99以上。此外,我们揭示了武器化与实际利用的本质差异(例如通过解释决策树输出),并分析了黑客社区的相关收益及其他方面。总体而言,本研究揭示了漏洞在野利用的内在规律,可为EPSS、预期可利用性等模型提供额外的基准真值。