Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision have showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with their fundamental characteristics and challenges motivated by the success of large language models (LLMs). Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.
翻译:决策制定需要感知、记忆与推理之间的复杂交互,以识别最优策略。传统的决策方法面临样本效率低下和泛化能力不足的挑战。相比之下,语言和视觉领域的基础模型已展现出对多样化新任务的快速适应能力。因此,我们主张构建基础智能体,将其作为智能体学习范式的变革性转变。这一提议基于对基础智能体的形式化定义,其基本特征与挑战的提出受到大语言模型(LLMs)成功的启发。此外,我们详细阐述了基础智能体的发展路线图:从大规模交互数据的收集或生成,到自监督预训练与适应,再到与LLMs的知识与价值对齐。最后,我们指出了由该形式化定义衍生出的关键研究问题,并通过实际用例阐述了基础智能体的发展趋势,同时从技术与理论层面探讨了如何推动该领域迈向更全面、更具影响力的未来。