The swift spread of fake news and disinformation campaigns poses a significant threat to public trust, political stability, and cybersecurity. Traditional Cyber Threat Intelligence (CTI) approaches, which rely on low-level indicators such as domain names and social media handles, are easily evaded by adversaries who frequently modify their online infrastructure. To address these limitations, we introduce a novel CTI framework that focuses on high-level, semantic indicators derived from recurrent narratives and relationships of disinformation campaigns. Our approach extracts structured CTI indicators from unstructured disinformation content, capturing key entities and their contextual dependencies within fake news using Large Language Models (LLMs). We further introduce FakeCTI, the first dataset that systematically links fake news to disinformation campaigns and threat actors. To evaluate the effectiveness of our CTI framework, we analyze multiple fake news attribution techniques, spanning from traditional Natural Language Processing (NLP) to fine-tuned LLMs. This work shifts the focus from low-level artifacts to persistent conceptual structures, establishing a scalable and adaptive approach to tracking and countering disinformation campaigns.
翻译:虚假新闻与虚假信息活动的迅速传播对公众信任、政治稳定和网络安全构成了重大威胁。传统的网络威胁情报方法依赖于域名和社交媒体账号等低层级指标,极易被频繁变更其在线基础设施的对手规避。为应对这些局限,我们提出了一种新颖的CTI框架,该框架专注于从虚假信息活动的反复叙事和关系中提取高层次语义指标。我们的方法利用大型语言模型从非结构化的虚假信息内容中提取结构化的CTI指标,捕获假新闻中的关键实体及其上下文依赖关系。我们进一步引入了FakeCTI,这是首个系统地将假新闻与虚假信息活动及威胁行为者关联起来的数据集。为评估我们CTI框架的有效性,我们分析了多种假新闻溯源技术,涵盖从传统自然语言处理到微调LLM的方法。这项工作将焦点从低层级技术痕迹转向持久的概念结构,建立了一种可扩展且自适应的追踪与对抗虚假信息活动的方法。