Large language model (LLM) agents are rapidly becoming trusted copilots in high-stakes domains like software development and healthcare. However, this deepening trust introduces a novel attack surface: Agent-Mediated Deception (AMD), where compromised agents are weaponized against their human users. While extensive research focuses on agent-centric threats, human susceptibility to deception by a compromised agent remains unexplored. We present the first large-scale empirical study with 303 participants to measure human susceptibility to AMD. This is based on HAT-Lab (Human-Agent Trust Laboratory), a high-fidelity research platform we develop, featuring nine carefully crafted scenarios spanning everyday and professional domains (e.g., healthcare, software development, human resources). Our 10 key findings reveal significant vulnerabilities and provide future defense perspectives. Specifically, only 8.6% of participants perceive AMD attacks, while domain experts show increased susceptibility in certain scenarios. We identify six cognitive failure modes in users and find that their risk awareness often fails to translate to protective behavior. The defense analysis reveals that effective warnings should interrupt workflows with low verification costs. With experiential learning based on HAT-Lab, over 90% of users who perceive risks report increased caution against AMD. This work provides empirical evidence and a platform for human-centric agent security research.
翻译:大语言模型(LLM)智能体正迅速成为软件开发与医疗保健等高风险领域中的可信协作伙伴。然而,这种日益加深的信任关系引入了一种新型攻击面:智能体媒介欺骗(AMD),即被攻陷的智能体被武器化用以对抗人类用户。尽管现有研究多聚焦于以智能体为中心的威胁,人类对受控智能体欺骗行为的易感性仍未得到充分探索。我们开展了首个涉及303名参与者的大规模实证研究,以量化人类对AMD的易感性。该研究基于我们开发的高保真研究平台HAT-Lab(人机信任实验室),该平台涵盖日常与专业领域(如医疗保健、软件开发、人力资源)的九个精心设计的场景。我们提出的十项关键发现揭示了显著的脆弱性,并为未来防御提供了方向。具体而言,仅8.6%的参与者能察觉AMD攻击,而领域专家在特定场景中表现出更高的易感性。我们识别出用户存在的六类认知失效模式,并发现其风险认知往往未能转化为防护行为。防御分析表明,有效的警告机制需以较低验证成本中断工作流程。基于HAT-Lab的体验式学习可使超过90%感知到风险的用户提升对AMD的警惕性。本研究为以人为中心的智能体安全研究提供了实证依据与实验平台。