MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control

Autonomous agents powered by large language models (LLMs) show promising potential in assistive tasks across various domains, including mobile device control. As these agents interact directly with personal information and device settings, ensuring their safe and reliable behavior is crucial to prevent undesirable outcomes. However, no benchmark exists for standardized evaluation of the safety of mobile device-control agents. In this work, we introduce MobileSafetyBench, a benchmark designed to evaluate the safety of device-control agents within a realistic mobile environment based on Android emulators. We develop a diverse set of tasks involving interactions with various mobile applications, including messaging and banking applications, challenging agents with managing risks encompassing misuse and negative side effects. These tasks include tests to evaluate the safety of agents in daily scenarios as well as their robustness against indirect prompt injection attacks. Our experiments demonstrate that baseline agents, based on state-of-the-art LLMs, often fail to effectively prevent harm while performing the tasks. To mitigate these safety concerns, we propose a prompting method that encourages agents to prioritize safety considerations. While this method shows promise in promoting safer behaviors, there is still considerable room for improvement to fully earn user trust. This highlights the urgent need for continued research to develop more robust safety mechanisms in mobile environments. We open-source our benchmark at: https://mobilesafetybench.github.io/.

翻译：由大型语言模型（LLMs）驱动的自主智能体在包括移动设备控制在内的多个领域展现出辅助任务的巨大潜力。由于这些智能体直接与个人信息及设备设置交互，确保其行为安全可靠对于防止不良后果至关重要。然而，目前尚缺乏用于标准化评估移动设备控制智能体安全性的基准。在本工作中，我们提出了MobileSafetyBench，这是一个基于Android模拟器的真实移动环境内评估设备控制智能体安全性的基准。我们开发了一套多样化的任务，涉及与包括消息和银行应用在内的多种移动应用程序的交互，通过管理涵盖误用和负面副作用的风险来挑战智能体。这些任务包括评估智能体在日常场景中的安全性以及其抵御间接提示注入攻击的鲁棒性测试。我们的实验表明，基于最先进LLMs的基线智能体在执行任务时常常无法有效防止危害。为缓解这些安全问题，我们提出了一种提示方法，鼓励智能体优先考虑安全因素。尽管该方法在促进更安全行为方面显示出潜力，但要完全赢得用户信任仍有相当大的改进空间。这凸显了持续研究以开发更鲁棒的移动环境安全机制的迫切需求。我们在以下地址开源了我们的基准：https://mobilesafetybench.github.io/。