Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision have showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with their fundamental characteristics and challenges motivated by the success of large language models (LLMs). Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.
翻译:决策制定需要感知、记忆与推理之间的复杂交互,以识别最优策略。传统的决策方法面临样本效率低和泛化能力差的挑战。相比之下,语言与视觉领域的基础模型已展现出快速适应多样化新任务的能力。因此,我们主张构建基础智能体,将其作为智能体学习范式的变革性转移。这一提议的支撑在于:基于大语言模型(LLMs)的成功经验,我们提出了基础智能体的基本特征与挑战。此外,我们明确了基础智能体的发展路线图:从大规模交互数据的收集或生成,到自监督预训练与适应,再到与LLMs的知识与价值对齐。最后,我们指出了由该框架衍生的关键研究问题,并通过实际用例阐述了基础智能体的发展趋势,同时从技术与理论层面探讨了推动该领域迈向更全面、更具影响力未来的路径。