RAGRank：利用PageRank抵御网络威胁情报大语言模型流程中的投毒攻击 (RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines)

Retrieval-Augmented Generation (RAG) has emerged as the dominant architectural pattern to operationalize Large Language Model (LLM) usage in Cyber Threat Intelligence (CTI) systems. However, this design is susceptible to poisoning attacks, and previously proposed defenses can fail for CTI contexts as cyber threat information is often completely new for emerging attacks, and sophisticated threat actors can mimic legitimate formats, terminology, and stylistic conventions. To address this issue, we propose that the robustness of modern RAG defenses can be accelerated by applying source credibility algorithms on corpora, using PageRank as an example. In our experiments, we demonstrate quantitatively that our algorithm applies a lower authority score to malicious documents while promoting trusted content, using the standardized MS MARCO dataset. We also demonstrate proof-of-concept performance of our algorithm on CTI documents and feeds.

翻译：检索增强生成（RAG）已成为在网络威胁情报（CTI）系统中应用大语言模型（LLM）的主流架构范式。然而，该设计易受投毒攻击，且先前提出的防御方案在CTI场景中可能失效，原因在于新兴攻击相关的网络威胁信息常属全新内容，而复杂的威胁行为者能够模仿合法的格式、术语与行文规范。为解决此问题，我们提出通过在语料库上应用来源可信度算法（以PageRank为例）来增强现代RAG防御机制的鲁棒性。实验中，我们基于标准化MS MARCO数据集定量验证了所提算法能够有效降低恶意文档的权威性评分，同时提升可信内容的权重。此外，我们在CTI文档与数据流上验证了该算法的概念可行性。

相关内容

PageRank

关注 210

PageRank，网页排名，又称网页级别、Google左侧排名或佩奇排名，是一种由[1] 根据网页之间相互的超链接计算的技术，而作为网页排名的要素之一，以Google公司创办人拉里·佩奇（Larry Page）之姓来命名。Google用它来体现网页的相关性和重要性，在搜索引擎优化操作中是经常被用来评估网页优化的成效因素之一。Google的创始人拉里·佩奇和谢尔盖·布林于1998年在斯坦福大学发明了这项技术。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日