Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system that can handle arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at https://autodroid-sys.github.io/.
翻译:移动任务自动化是一项极具吸引力的技术,旨在实现基于语音的免提用户与智能手机交互。然而,现有方法因语言理解能力有限且需要开发者或终端用户投入大量人工劳动,导致可扩展性较差。近年来,大语言模型(LLM)在语言理解与推理方面的突破,促使我们从以模型为中心的角度重新思考该问题——即任务准备、理解与执行均由统一语言模型处理。本研究提出AutoDroid系统,该系统无需人工干预即可处理任意安卓应用上的任务。其核心思想是通过自动化动态分析,将LLM的常识知识与应用的领域特定知识相结合。主要组件包括:连接UI与LLM的功能感知UI表示方法、增强LLM应用特定领域知识的基于探索的记忆注入技术,以及降低模型推理成本的多粒度查询优化模块。我们将AutoDroid集成至现成LLM(包括在线GPT-4/GPT-3.5和本地Vicuna),并在包含158个常见任务的新内存增强型安卓任务自动化基准测试中评估其性能。结果表明,AutoDroid能以90.9%的准确率精准生成动作,并以71.3%的成功率完成任务,相比基于GPT-4的基线模型分别提升36.4%和39.7%。AutoDroid的演示、基准测试套件及源代码将发布于https://autodroid-sys.github.io/。