Hacker forums provide critical early warning signals for emerging cybersecurity threats, but extracting actionable intelligence from their unstructured and noisy content remains a significant challenge. This paper presents an unsupervised framework that automatically detects, clusters, and prioritizes security events discussed across hacker forum posts. Our approach leverages Transformer-based embeddings fine-tuned with contrastive learning to group related discussions into distinct security event clusters, identifying incidents like zero-day disclosures or malware releases without relying on predefined keywords. The framework incorporates a daily ranking mechanism that prioritizes identified events using quantifiable metrics reflecting timeliness, source credibility, information completeness, and relevance. Experimental evaluation on real-world hacker forum data demonstrates that our method effectively reduces noise and surfaces high-priority threats, enabling security analysts to mount proactive responses. By transforming disparate hacker forum discussions into structured, actionable intelligence, our work addresses fundamental challenges in automated threat detection and analysis.
翻译:黑客论坛为新兴网络安全威胁提供了关键的早期预警信号,但从其非结构化且充满噪声的内容中提取可操作情报仍是一项重大挑战。本文提出一种无监督框架,能够自动检测、聚类并优先处理黑客论坛帖子中讨论的安全事件。该方法利用基于Transformer的嵌入表示,并通过对比学习进行微调,将相关讨论分组为不同的安全事件聚类,从而识别零日漏洞披露或恶意软件发布等事件,而无需依赖预定义关键词。该框架整合了每日排序机制,使用反映时效性、来源可信度、信息完整性和相关性的可量化指标对识别出的事件进行优先级排序。在真实世界黑客论坛数据上的实验评估表明,我们的方法能有效降低噪声并凸显高优先级威胁,使安全分析师能够采取主动响应措施。通过将分散的黑客论坛讨论转化为结构化、可操作的情报,我们的工作解决了自动化威胁检测与分析中的根本性挑战。