Neuro-symbolic AI (NeSy) pairs neural perception with symbolic reasoning, making it attractive for high-stakes domains where explainability and structured inference are required. However, this hybrid architecture introduces an enlarged attack surface spanning five layers: neural perception, symbolic knowledge bases, reasoning engines, agentic orchestration, and data stores -- each exploitable in ways absent from purely neural systems. This paper makes six contributions: (1) formal definitions of NeSy Attack Surface, Symbolic Integrity Violation (SIV), and Cross-Layer Amplification Ratio $\mathcal{X}$, decomposed into neural-caused and autonomous symbolic sensitivity components; (2) a unified threat model extending MITRE ATLAS with 11 NeSy-specific tactic extensions and a five-profile attacker taxonomy; (3) a symbolic-layer threat catalogue covering knowledge graph (KG) poisoning, ontology-merging, and inference-engine subversion; (4) analysis of cognitive risks -- automation bias, authority bias, and sycophantic reinforcement -- structurally amplified by NeSy's explicit logical explanations relative to black-box neural outputs; (5) interdisciplinary mitigations with measurable acceptance criteria aligned to NIST AI 600-1 and the EU AI Act; (6) three empirical benchmarks: (E1) targeted KG poisoning achieves break-even SIV at injection budget $B=5$ on a 205-entity medical KG, with a KG-specific stealth/SIV trade-off; (E2) PGD-10 at $\varepsilon=0.01$ yields $\mathcal{X}=5.884$ (95% CI $[4.64,\, 8.00]$, $p<0.0001$), confirmed adversarially specific by a matched-random baseline ($E^{R}_{\mathrm{rand}}=0$), on a DistilBERT+ProbLog pipeline; (E3) single-axiom OWL edits achieve 93.3% SIV success with 100% Pellet-consistency stealth, but held-out STIX detection fails at 50% (random-guessing level), an open problem.
翻译:神经符号AI(NeSy)将神经感知与符号推理相结合,使其在需要可解释性和结构化推理的高风险领域具有吸引力。然而,这种混合架构引入了跨越五个层次的扩大攻击面:神经感知、符号知识库、推理引擎、智能体编排和数据存储——每一层都可能以纯神经系统中不存在的方式被利用。本文做出六项贡献:(1) 正式定义了NeSy攻击面、符号完整性违反(SIV)和跨层放大比率$\mathcal{X}$,该比率分解为神经导致和自主符号敏感性分量;(2) 一个统一的威胁模型,扩展了MITRE ATLAS,包含11个NeSy特定策略扩展和一个五类攻击者分类;(3) 一个符号层威胁目录,涵盖知识图谱(KG)投毒、本体合并和推理引擎颠覆;(4) 认知风险分析——自动化偏差、权威偏差和谄媚强化——这些风险因NeSy相对于黑盒神经输出的显式逻辑解释而在结构上被放大;(5) 跨学科缓解措施及符合NIST AI 600-1和欧盟AI法案的可衡量验收标准;(6) 三个实证基准:(E1) 在包含205个实体的医学知识图谱上,针对性KG投毒在注入预算$B=5$时达到盈亏平衡SIV,并存在KG特定的隐蔽性/SIV权衡;(E2) 在DistilBERT+ProbLog流水线上,$\varepsilon=0.01$的PGD-10得到$\mathcal{X}=5.884$(95%置信区间$[4.64,\, 8.00]$,$p<0.0001$),通过匹配随机基线($E^{R}_{\mathrm{rand}}=0$)确认为对抗性特定;(E3) 单公理OWL编辑实现93.3%的SIV成功率且100%保持Pellet一致性的隐蔽性,但留出法STIX检测在50%(随机猜测水平)失败,这是一个开放问题。