GUI Agents: A Survey - 专知论文

Dang Nguyen,Jian Chen,Yu Wang,Gang Wu,Namyong Park,Zhengmian Hu,Hanjia Lyu,Junda Wu,Ryan Aponte,Yu Xia,Xintong Li,Jing Shi,Hongjie Chen,Viet Dac Lai,Zhouhang Xie,Sungchul Kim,Ruiyi Zhang,Tong Yu,Mehrab Tanjim,Nesreen K. Ahmed,Puneet Mathur,Seunghyun Yoon,Lina Yao,Branislav Kveton,Thien Huu Nguyen,Trung Bui,Tianyi Zhou,Ryan A. Rossi,Franck Dernoncourt

Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and fundamental importance of GUI agents, we provide a comprehensive survey that categorizes their benchmarks, evaluation metrics, architectures, and training methods. We propose a unified framework that delineates their perception, reasoning, planning, and acting capabilities. Furthermore, we identify important open challenges and discuss key future directions. Finally, this work serves as a basis for practitioners and researchers to gain an intuitive understanding of current progress, techniques, benchmarks, and critical open problems that remain to be addressed.

翻译：基于大型基础模型的图形用户界面（GUI）代理已成为自动化人机交互的一种变革性方法。这些代理通过 GUI 自主与数字系统或软件应用进行交互，模拟人类在不同平台上的点击、输入和导航视觉元素等行为。鉴于对 GUI 代理日益增长的兴趣及其基础重要性，本文提供了一项全面的综述，对其基准测试、评估指标、架构和训练方法进行了分类。我们提出了一个统一的框架，用以描述其感知、推理、规划和执行能力。此外，我们指出了重要的开放挑战并讨论了关键的未来方向。最后，本文为从业者和研究人员提供了一个基础，以直观地理解当前进展、技术、基准以及有待解决的关键开放性问题。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日