As AI continues to advance, there is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions. This evolution requires the transition from traditional Large Language Models (LLMs), which excel at generating textual responses, to Large Action Models (LAMs), designed for action generation and execution within dynamic environments. Enabled by agent systems, LAMs hold the potential to transform AI from passive language understanding to active task completion, marking a significant milestone in the progression toward artificial general intelligence. In this paper, we present a comprehensive framework for developing LAMs, offering a systematic approach to their creation, from inception to deployment. We begin with an overview of LAMs, highlighting their unique characteristics and delineating their differences from LLMs. Using a Windows OS-based agent as a case study, we provide a detailed, step-by-step guide on the key stages of LAM development, including data collection, model training, environment integration, grounding, and evaluation. This generalizable workflow can serve as a blueprint for creating functional LAMs in various application domains. We conclude by identifying the current limitations of LAMs and discussing directions for future research and industrial deployment, emphasizing the challenges and opportunities that lie ahead in realizing the full potential of LAMs in real-world applications. The code for the data collection process utilized in this paper is publicly available at: https://github.com/microsoft/UFO/tree/main/dataflow, and comprehensive documentation can be found at https://microsoft.github.io/UFO/dataflow/overview/.
翻译:随着人工智能的持续发展,人们越来越需要超越基于语言的辅助、迈向能够执行现实世界行动的智能体系统。这一演进要求从擅长生成文本响应的传统大型语言模型,转向为在动态环境中进行行动生成与执行而设计的大型行动模型。在智能体系统的赋能下,大型行动模型有潜力将人工智能从被动的语言理解转变为主动的任务完成,标志着迈向通用人工智能进程中的一个重要里程碑。本文提出了一个用于开发大型行动模型的综合框架,提供了一种从构想到部署的系统性创建方法。我们首先概述了大型行动模型,强调了其独特特性,并阐明了其与大型语言模型的区别。以一个基于Windows操作系统的智能体作为案例研究,我们详细、逐步地介绍了大型行动模型开发的关键阶段,包括数据收集、模型训练、环境集成、接地与评估。这一可泛化的工作流程可作为在各种应用领域创建功能性大型行动模型的蓝图。最后,我们指出了大型行动模型当前的局限性,并讨论了未来研究和工业部署的方向,强调了在现实应用中充分实现大型行动模型潜力所面临的挑战与机遇。本文所使用的数据收集过程代码已公开于:https://github.com/microsoft/UFO/tree/main/dataflow,完整文档可在 https://microsoft.github.io/UFO/dataflow/overview/ 找到。