Data agents integrate LLM-driven reasoning with relational data access, executable analytical tools, and multi-step workflow orchestration, making them increasingly central to enterprise analytics. This integration introduces new security vulnerabilities across data resources, database execution, and agent reasoning, recombining concerns from database security and general-purpose LLM-agent security into failure modes that neither line of work captures on its own. To address this gap, we present a systematic security study of data agents. Our contributions are threefold. First, we develop a layered vulnerability framework that identifies eight data agent-specific risks across interpretation, execution, and policy layers. Second, we introduce an attack taxonomy organized by adversary goal, tactic, and technique, covering three goals, seven tactics, and fourteen techniques, and pair it with an LLM-driven payload generation pipeline grounded in real database schemas. Third, we evaluate these attacks on six systems, including four open-source data agents and two production cloud analytics services. Our experiments reveal substantial security vulnerabilities across current systems and yield four key takeaways.
翻译:数据代理将LLM驱动的推理与关系数据访问、可执行分析工具及多步骤工作流编排相结合,在企业分析中日益占据核心地位。这种整合引入了数据资源、数据库执行与代理推理层面的新型安全漏洞,将数据库安全与通用LLM代理安全领域的问题重组为现有研究均无法单独涵盖的失效模式。为填补这一空白,我们开展了数据代理的系统性安全研究。本文贡献有三:其一,构建分层脆弱性框架,识别出解释层、执行层与策略层八种数据代理特有风险;其二,提出以攻击者目标、战术与技术为核心的组织化攻击分类体系,涵盖三种目标、七种战术与十四种技术,并配套基于真实数据库模式的LLM驱动载荷生成流水线;其三,在六套系统(包括四款开源数据代理与两款生产级云分析服务)上开展攻击评估。实验揭示了当前系统的重大安全漏洞,并得出四项关键发现。