Empowering LLM to use Smartphone for Intelligent Task Automation

Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system that can handle arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at https://autodroid-sys.github.io/.

翻译：移动任务自动化是一项极具吸引力的技术，旨在实现基于语音的免提用户与智能手机交互。然而，现有方法因语言理解能力有限且需要开发者或终端用户投入大量人工劳动，导致可扩展性较差。近年来，大语言模型（LLM）在语言理解与推理方面的突破，促使我们从以模型为中心的角度重新思考该问题——即任务准备、理解与执行均由统一语言模型处理。本研究提出AutoDroid系统，该系统无需人工干预即可处理任意安卓应用上的任务。其核心思想是通过自动化动态分析，将LLM的常识知识与应用的领域特定知识相结合。主要组件包括：连接UI与LLM的功能感知UI表示方法、增强LLM应用特定领域知识的基于探索的记忆注入技术，以及降低模型推理成本的多粒度查询优化模块。我们将AutoDroid集成至现成LLM（包括在线GPT-4/GPT-3.5和本地Vicuna），并在包含158个常见任务的新内存增强型安卓任务自动化基准测试中评估其性能。结果表明，AutoDroid能以90.9%的准确率精准生成动作，并以71.3%的成功率完成任务，相比基于GPT-4的基线模型分别提升36.4%和39.7%。AutoDroid的演示、基准测试套件及源代码将发布于https://autodroid-sys.github.io/。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日