Robots are expected to play a major role in the future construction industry but face challenges due to high costs and difficulty adapting to dynamic tasks. This study explores the potential of foundation models to enhance the adaptability and generalizability of task planning in construction robots. Four models are proposed and implemented using lightweight, open-source large language models (LLMs) and vision language models (VLMs). These models include one single agent and three multi-agent teams that collaborate to create robot action plans. The models are evaluated across three construction roles: Painter, Safety Inspector, and Floor Tiling. Results show that the four-agent team outperforms the state-of-the-art GPT-4o in most metrics while being ten times more cost-effective. Additionally, teams with three and four agents demonstrate the improved generalizability. By discussing how agent behaviors influence outputs, this study enhances the understanding of AI teams and supports future research in diverse unstructured environments beyond construction.
翻译:机器人有望在未来建筑业中发挥重要作用,但面临成本高昂和难以适应动态任务的挑战。本研究探索了基础模型在提升建造机器人任务规划的适应性与泛化能力方面的潜力。我们利用轻量级开源大语言模型(LLMs)和视觉语言模型(VLMs)提出并实现了四种模型架构,包括一个单智能体系统和三个通过协作生成机器人行动规划的多智能体团队。这些模型在油漆工、安全巡检员和地板铺贴工三种建造角色中进行了评估。结果表明,四智能体团队在多数指标上优于当前最先进的GPT-4o模型,同时具备十倍以上的成本效益。此外,三智能体与四智能体团队展现出更强的泛化能力。通过分析智能体行为对输出的影响,本研究深化了对人工智能团队协作机制的理解,为未来在建造及其他非结构化环境中的研究提供了支持。