Large language models (LLMs) have shown strong capabilities in multi-step decision-making, planning and actions, and are increasingly integrated into various real-world applications. It is concerning whether their strong problem-solving abilities may be misused for crimes. To address this gap, we propose VirtualCrime, a sandbox simulation framework based on a three-agent system to evaluate the criminal capabilities of models. Specifically, this framework consists of an attacker agent acting as the leader of a criminal team, a judge agent determining the outcome of each action, and a world manager agent updating the environment state and entities. Furthermore, we design 40 diverse crime tasks within this framework, covering 11 maps and 13 crime objectives such as theft, robbery, kidnapping, and riot. We also introduce a human player baseline for reference to better interpret the performance of LLM agents. We evaluate 8 strong LLMs and find (1) All agents in the simulation environment compliantly generate detailed plans and execute intelligent crime processes, with some achieving relatively high success rates; (2) In some cases, agents take severe action that inflicts harm to NPCs to achieve their goals. Our work highlights the need for safety alignment when deploying agentic AI in real-world settings.
翻译:大语言模型在多步决策、规划与行动方面展现出强大能力,并逐渐融入各类实际应用。令人担忧的是,其强大的问题解决能力是否可能被滥用于犯罪活动。为填补这一研究空白,我们提出VirtualCrime——一种基于三智能体系统的沙盒模拟框架,用于评估模型的犯罪能力。具体而言,该框架包含三个核心角色:作为犯罪团队领导者的攻击型智能体、判定行动结果的裁决型智能体,以及负责更新环境状态与实体的世界管理型智能体。在此基础上,我们设计覆盖11个地图场景、包含盗窃、抢劫、绑架、暴乱等13类犯罪目标的40项多样化犯罪任务,并引入人类玩家基线作为参照,以更准确解读大语言模型智能体的表现。通过对8个强模型进行评估,我们发现:(1)所有模拟环境中的智能体均能合规生成详细犯罪计划并执行智能化犯罪流程,部分模型取得较高成功率;(2)在某些情况下,智能体为达成目标会对非玩家角色采取造成伤害的极端行为。本研究表明,在现实场景部署具身人工智能时,必须进行安全对齐。