Conversational agents show the promise to allow users to interact with mobile devices using language. However, to perform diverse UI tasks with natural language, developers typically need to create separate datasets and models for each specific task, which is expensive and effort-consuming. Recently, pre-trained large language models (LLMs) have been shown capable of generalizing to various downstream tasks when prompted with a handful of examples from the target task. This paper investigates the feasibility of enabling versatile conversational interactions with mobile UIs using a single LLM. We designed prompting techniques to adapt an LLM to mobile UIs. We experimented with four important modeling tasks that address various scenarios in conversational interaction. Our method achieved competitive performance on these challenging tasks without requiring dedicated datasets and training, offering a lightweight and generalizable approach to enable language-based mobile interaction.
翻译:对话式智能体展示了用户使用语言与移动设备交互的潜力。然而,若要借助自然语言执行多样化的界面任务,开发者通常需要为每个特定任务创建独立的数据集和模型,这一过程成本高昂且耗费精力。近期研究表明,预训练大型语言模型(LLMs)在目标任务少量示例的提示下,能够泛化至各类下游任务。本文探究了利用单一大型语言模型实现与移动界面进行多功能对话交互的可行性。我们设计了针对移动界面适配大语言模型的提示技术,并围绕对话交互中涉及的不同场景,对四种重要建模任务进行了实验。无需专用数据集和训练,我们的方法即可在这些挑战性任务上取得竞争性表现,为基于语言的移动交互提供了一种轻量级且可泛化的解决方案。