Cyberattacks cause billions of dollars in damage annually, with malicious hackers often sharing exploit code and techniques on underground forums. Identifying which organizations are targeted by these exploits is critical for proactive Cyber Threat Intelligence (CTI). To address that gap, we propose Temporal Representation and Classification of Exploits (TRACE), a vendor-conditioned contrastive learning framework built on CySecBERT that jointly optimizes organizational target classification and vendor-coherent representations while evaluating robustness under temporal distribution shift. Unlike prior work limited to small, single-source datasets, we leverage a large-scale, multi-source corpus spanning 9 exploit databases and hacker forums, comprising 352,866 posts collected over three decades, yielding a 129,126-sample dataset across seven organizational categories. In the temporal out-of-distribution evaluation, TRACE achieves macro F1=97.00\%, substantially outperforming 17 benchmark classical ML methods, deep learning with GloVe/FastText embeddings, and pretrained transformer models.
翻译:网络攻击每年造成数十亿美元的经济损失,恶意黑客常在暗网论坛共享漏洞利用代码与技术。识别这些漏洞利用所针对的组织,对于主动式网络威胁情报(CTI)具有重要意义。针对这一空白,我们提出漏洞利用的时间表征与分类框架(TRACE),这是一种基于CySecBERT构建的面向厂商的条件对比学习框架,可联合优化组织目标分类与厂商一致性表征,同时评估时间分布偏移下的鲁棒性。与先前局限于小规模、单源数据集的研究不同,我们利用涵盖9个漏洞数据库与黑客论坛的大规模多源语料库,包含三十年间采集的352,866篇帖子,最终生成包含129,126个样本、涵盖七类组织类别的数据集。在时间分布外评估中,TRACE实现宏F1=97.00%,显著优于17种基准经典机器学习方法、基于GloVe/FastText嵌入的深度学习模型及预训练Transformer模型。