Robot design is a nontrivial process that involves careful consideration of multiple criteria, including user specifications, kinematic structures, and visual appearance. Therefore, the design process often relies heavily on domain expertise and significant human effort. The majority of current methods are rule-based, requiring the specification of a grammar or a set of primitive components and modules that can be composed to create a design. We propose a novel automated robot design framework, RobotDesignGPT, that leverages the general knowledge and reasoning capabilities of large pre-trained vision-language models to automate the robot design synthesis process. Our framework synthesizes an initial robot design from a simple user prompt and a reference image. Our novel visual feedback approach allows us to greatly improve the design quality and reduce unnecessary manual feedback. We demonstrate that our framework can design visually appealing and kinematically valid robots inspired by nature, ranging from legged animals to flying creatures. We justify the proposed framework by conducting an ablation study and a user study.
翻译:机器人设计是一个复杂的过程,需要综合考虑用户规格、运动学结构和视觉外观等多重标准。因此,设计过程通常严重依赖领域专业知识和大量人力投入。当前大多数方法基于规则,需要定义语法或一组可通过组合生成设计的原始组件和模块。我们提出了一种新颖的自动化机器人设计框架RobotDesignGPT,该框架利用大规模预训练视觉语言模型的通用知识与推理能力,实现机器人设计合成过程的自动化。我们的框架能够根据简单的用户提示和参考图像合成初始机器人设计。我们提出的新颖视觉反馈方法显著提升了设计质量,并减少了不必要的人工反馈。我们通过实验证明,该框架能够设计出受自然启发的、视觉吸引力强且运动学有效的机器人,涵盖从多足动物到飞行生物等多种形态。我们通过消融实验和用户研究验证了所提出框架的有效性。