Safety, Security, and Cognitive Risks in Neuro-Symbolic AI

Neuro-symbolic AI (NeSy) pairs neural perception with symbolic reasoning, making it attractive for high-stakes domains where explainability and structured inference are required. However, this hybrid architecture introduces an enlarged attack surface spanning five layers: neural perception, symbolic knowledge bases, reasoning engines, agentic orchestration, and data stores -- each exploitable in ways absent from purely neural systems. This paper makes six contributions: (1) formal definitions of NeSy Attack Surface, Symbolic Integrity Violation (SIV), and Cross-Layer Amplification Ratio $\mathcal{X}$, decomposed into neural-caused and autonomous symbolic sensitivity components; (2) a unified threat model extending MITRE ATLAS with 11 NeSy-specific tactic extensions and a five-profile attacker taxonomy; (3) a symbolic-layer threat catalogue covering knowledge graph (KG) poisoning, ontology-merging, and inference-engine subversion; (4) analysis of cognitive risks -- automation bias, authority bias, and sycophantic reinforcement -- structurally amplified by NeSy's explicit logical explanations relative to black-box neural outputs; (5) interdisciplinary mitigations with measurable acceptance criteria aligned to NIST AI 600-1 and the EU AI Act; (6) three empirical benchmarks: (E1) targeted KG poisoning achieves break-even SIV at injection budget $B=5$ on a 205-entity medical KG, with a KG-specific stealth/SIV trade-off; (E2) PGD-10 at $\varepsilon=0.01$ yields $\mathcal{X}=5.884$ (95% CI $[4.64,\, 8.00]$, $p<0.0001$), confirmed adversarially specific by a matched-random baseline ($E^{R}_{\mathrm{rand}}=0$), on a DistilBERT+ProbLog pipeline; (E3) single-axiom OWL edits achieve 93.3% SIV success with 100% Pellet-consistency stealth, but held-out STIX detection fails at 50% (random-guessing level), an open problem.

翻译：神经符号AI（NeSy）将神经感知与符号推理相结合，使其在需要可解释性和结构化推理的高风险领域具有吸引力。然而，这种混合架构引入了跨越五个层次的扩大攻击面：神经感知、符号知识库、推理引擎、智能体编排和数据存储——每一层都可能以纯神经系统中不存在的方式被利用。本文做出六项贡献：(1) 正式定义了NeSy攻击面、符号完整性违反（SIV）和跨层放大比率$\mathcal{X}$，该比率分解为神经导致和自主符号敏感性分量；(2) 一个统一的威胁模型，扩展了MITRE ATLAS，包含11个NeSy特定策略扩展和一个五类攻击者分类；(3) 一个符号层威胁目录，涵盖知识图谱（KG）投毒、本体合并和推理引擎颠覆；(4) 认知风险分析——自动化偏差、权威偏差和谄媚强化——这些风险因NeSy相对于黑盒神经输出的显式逻辑解释而在结构上被放大；(5) 跨学科缓解措施及符合NIST AI 600-1和欧盟AI法案的可衡量验收标准；(6) 三个实证基准：(E1) 在包含205个实体的医学知识图谱上，针对性KG投毒在注入预算$B=5$时达到盈亏平衡SIV，并存在KG特定的隐蔽性/SIV权衡；(E2) 在DistilBERT+ProbLog流水线上，$\varepsilon=0.01$的PGD-10得到$\mathcal{X}=5.884$（95%置信区间$[4.64,\, 8.00]$，$p<0.0001$），通过匹配随机基线（$E^{R}_{\mathrm{rand}}=0$）确认为对抗性特定；(E3) 单公理OWL编辑实现93.3%的SIV成功率且100%保持Pellet一致性的隐蔽性，但留出法STIX检测在50%（随机猜测水平）失败，这是一个开放问题。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

神经符号人工智能：黑盒模型时代下以任务为导向的综述

专知会员服务

13+阅读 · 3月4日

保护网络物理系统中的 AI 智能体：关于环境交互、深度伪造威胁及其防御技术的综述

专知会员服务

10+阅读 · 2月15日

AI在医疗中的安全挑战

专知会员服务

19+阅读 · 2024年10月5日

神经符号人工智能军事应用

专知会员服务

37+阅读 · 2024年8月23日