Large Language Models (LLMs) are rapidly transitioning from conversational assistants to autonomous agents embedded in critical organizational functions, including Security Operations Centers (SOCs), financial systems, and infrastructure management. Current adversarial testing paradigms focus predominantly on technical attack vectors: prompt injection, jailbreaking, and data exfiltration. We argue this focus is catastrophically incomplete. LLMs, trained on vast corpora of human-generated text, have inherited not merely human knowledge but human \textit{psychological architecture} -- including the pre-cognitive vulnerabilities that render humans susceptible to social engineering, authority manipulation, and affective exploitation. This paper presents the first systematic application of the Cybersecurity Psychology Framework (\cpf{}), a 100-indicator taxonomy of human psychological vulnerabilities, to non-human cognitive agents. We introduce the \textbf{Synthetic Psychometric Assessment Protocol} (\sysname{}), a methodology for converting \cpf{} indicators into adversarial scenarios targeting LLM decision-making. Our preliminary hypothesis testing across seven major LLM families reveals a disturbing pattern: while models demonstrate robust defenses against traditional jailbreaks, they exhibit critical susceptibility to authority-gradient manipulation, temporal pressure exploitation, and convergent-state attacks that mirror human cognitive failure modes. We term this phenomenon \textbf{Anthropomorphic Vulnerability Inheritance} (AVI) and propose that the security community must urgently develop ``psychological firewalls'' -- intervention mechanisms adapted from the Cybersecurity Psychology Intervention Framework (\cpif{}) -- to protect AI agents operating in adversarial environments.
翻译:大型语言模型(LLMs)正从对话助手迅速转变为嵌入关键组织功能(包括安全运营中心(SOCs)、金融系统和基础设施管理)的自主智能体。当前的对抗性测试范式主要聚焦于技术攻击向量:提示注入、越狱和数据窃取。我们认为这种关注存在灾难性的不完整性。LLMs在人类生成的海量文本语料上训练,不仅继承了人类知识,更继承了人类的\textit{心理架构}——包括使人类易受社会工程、权威操纵和情感剥削影响的认知前脆弱性。本文首次系统化地将网络安全心理学框架(\cpf{})——一个包含100项指标的人类心理脆弱性分类体系——应用于非人类认知智能体。我们提出\textbf{合成心理测量评估协议}(\sysname{}),这是一种将\cpf{}指标转化为针对LLM决策的对抗性场景的方法论。我们对七大主流LLM家族的初步假设测试揭示了一个令人不安的模式:尽管模型对传统越狱攻击展现出稳健防御,但它们对权威梯度操纵、时间压力利用以及反映人类认知失效模式的收敛态攻击表现出关键易感性。我们将此现象命名为\textbf{人形化脆弱性继承}(AVI),并主张安全社区必须紧急开发“心理防火墙”——改编自网络安全心理学干预框架(\cpif{})的干预机制——以保护在对抗性环境中运行的AI智能体。