Large Action Models: From Inception to Implementation

Lu Wang,Fangkai Yang,Chaoyun Zhang,Junting Lu,Jiaxu Qian,Shilin He,Pu Zhao,Bo Qiao,Ray Huang,Si Qin,Qisheng Su,Jiayi Ye,Yudi Zhang,Jian-Guang Lou,Qingwei Lin,Saravan Rajmohan,Dongmei Zhang,Qi Zhang

from arxiv, 25pages,12 figures

As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dynamic environments. Enabled by agent systems, LAMs hold the potential to transform AI from passive language understanding to active task completion, marking a significant milestone in the progression toward artificial general intelligence. In this paper, we present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment. We begin with an overview of LAMs, highlighting their unique characteristics and delineating their differences from LLMs. Using a Windows OS-based agent as a case study, we provide a detailed, step-by-step guide on the key stages of LAM development, including data collection, model training, environment integration, grounding, and evaluation. This generalizable workflow can serve as a blueprint for creating functional LAMs in various application domains. We conclude by identifying the current limitations of LAMs and discussing directions for future research and industrial deployment, emphasizing the challenges and opportunities that lie ahead in realizing the full potential of LAMs in real-world applications. The code for the data collection process utilized in this paper is publicly available at: https://github.com/microsoft/UFO/tree/main/dataflow, and comprehensive documentation can be found at https://microsoft.github.io/UFO/dataflow/overview/.

翻译：随着人工智能的持续发展，人们越来越需要超越基于语言的辅助、迈向能够执行现实世界行动的智能体系统。这一演进要求从擅长生成文本响应的传统大型语言模型，转向为在动态环境中进行行动生成与执行而设计的大型行动模型。在智能体系统的赋能下，大型行动模型有潜力将人工智能从被动的语言理解转变为主动的任务完成，标志着迈向通用人工智能进程中的一个重要里程碑。本文提出了一个用于开发大型行动模型的综合框架，提供了一种从构想到部署的系统性创建方法。我们首先概述了大型行动模型，强调了其独特特性，并阐明了其与大型语言模型的区别。以一个基于Windows操作系统的智能体作为案例研究，我们详细、逐步地介绍了大型行动模型开发的关键阶段，包括数据收集、模型训练、环境集成、接地与评估。这一可泛化的工作流程可作为在各种应用领域创建功能性大型行动模型的蓝图。最后，我们指出了大型行动模型当前的局限性，并讨论了未来研究和工业部署的方向，强调了在现实应用中充分实现大型行动模型潜力所面临的挑战与机遇。本文所使用的数据收集过程代码已公开于：https://github.com/microsoft/UFO/tree/main/dataflow，完整文档可在 https://microsoft.github.io/UFO/dataflow/overview/ 找到。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日