We propose a new concept, Evolution 6.0, which represents the evolution of robotics driven by Generative AI. When a robot lacks the necessary tools to accomplish a task requested by a human, it autonomously designs the required instruments and learns how to use them to achieve the goal. Evolution 6.0 is an autonomous robotic system powered by Vision-Language Models (VLMs), Vision-Language Action (VLA) models, and Text-to-3D generative models for tool design and task execution. The system comprises two key modules: the Tool Generation Module, which fabricates task-specific tools from visual and textual data, and the Action Generation Module, which converts natural language instructions into robotic actions. It integrates QwenVLM for environmental understanding, OpenVLA for task execution, and Llama-Mesh for 3D tool generation. Evaluation results demonstrate a 90% success rate for tool generation with a 10-second inference time, and action generation achieving 83.5% in physical and visual generalization, 70% in motion generalization, and 37% in semantic generalization. Future improvements will focus on bimanual manipulation, expanded task capabilities, and enhanced environmental interpretation to improve real-world adaptability.
翻译:我们提出一个新概念——进化6.0,它代表了由生成式人工智能驱动的机器人技术演进。当机器人缺乏完成人类所请求任务所需的工具时,它能自主设计所需器械,并学习如何使用这些器械以实现目标。进化6.0是一个由视觉语言模型、视觉语言动作模型以及用于工具设计与任务执行的文本到3D生成模型驱动的自主机器人系统。该系统包含两个核心模块:工具生成模块——从视觉与文本数据中制造任务专用工具;以及动作生成模块——将自然语言指令转化为机器人动作。系统集成了QwenVLM用于环境理解、OpenVLA用于任务执行、Llama-Mesh用于3D工具生成。评估结果显示:工具生成成功率达90%,推理时间为10秒;动作生成在物理与视觉泛化方面达到83.5%,运动泛化达70%,语义泛化达37%。未来改进将聚焦于双手协调操作、扩展任务能力以及增强环境理解,以提升系统在真实场景中的适应能力。