Autonomous agents powered by large language models (LLMs) show promising potential in assistive tasks across various domains, including mobile device control. As these agents interact directly with personal information and device settings, ensuring their safe and reliable behavior is crucial to prevent undesirable outcomes. However, no benchmark exists for standardized evaluation of the safety of mobile device-control agents. In this work, we introduce MobileSafetyBench, a benchmark designed to evaluate the safety of device-control agents within a realistic mobile environment based on Android emulators. We develop a diverse set of tasks involving interactions with various mobile applications, including messaging and banking applications, challenging agents with managing risks encompassing misuse and negative side effects. These tasks include tests to evaluate the safety of agents in daily scenarios as well as their robustness against indirect prompt injection attacks. Our experiments demonstrate that baseline agents, based on state-of-the-art LLMs, often fail to effectively prevent harm while performing the tasks. To mitigate these safety concerns, we propose a prompting method that encourages agents to prioritize safety considerations. While this method shows promise in promoting safer behaviors, there is still considerable room for improvement to fully earn user trust. This highlights the urgent need for continued research to develop more robust safety mechanisms in mobile environments.
翻译:基于大型语言模型(LLM)的自主代理在包括移动设备控制在内的多个领域的辅助任务中展现出巨大潜力。由于这些代理直接与个人信息和设备设置交互,确保其行为安全可靠对于防止不良后果至关重要。然而,目前尚缺乏用于标准化评估移动设备控制代理安全性的基准。在本工作中,我们提出了MobileSafetyBench,这是一个基于Android模拟器、旨在现实移动环境中评估设备控制代理安全性的基准。我们开发了一套多样化的任务,涉及与包括消息和银行应用在内的各种移动应用程序的交互,通过管理涵盖误用和负面副作用的风险来挑战代理。这些任务包括评估代理在日常场景中的安全性以及其抵御间接提示注入攻击的鲁棒性测试。我们的实验表明,基于最先进LLM的基线代理在执行任务时往往无法有效防止危害。为了缓解这些安全问题,我们提出了一种提示方法,鼓励代理优先考虑安全因素。虽然该方法在促进更安全行为方面显示出潜力,但要完全赢得用户信任仍有相当大的改进空间。这凸显了持续研究以在移动环境中开发更鲁棒安全机制的迫切需求。