Bimanual mobile manipulation requires a seamless integration between high-level semantic reasoning and safe, compliant physical interaction - a challenge that end-to-end models approach opaquely and classical controllers lack the context to address. This paper presents GenerativeMPC, a hierarchical cyber-physical framework that explicitly bridges semantic scene understanding with physical control parameters for bimanual mobile manipulators. The system utilizes a Vision-Language Model with Retrieval-Augmented Generation (VLM-RAG) to translate visual and linguistic context into grounded control constraints, specifically outputting dynamic velocity limits and safety margins for a Whole-Body Model Predictive Controller (MPC). Simultaneously, the VLM-RAG module modulates virtual stiffness and damping gains for a unified impedance-admittance controller, enabling context-aware compliance during human-robot interaction. Our framework leverages an experience-driven vector database to ensure consistent parameter grounding without retraining. Experimental results in MuJoCo, IsaacSim, and on a physical bimanual platform confirm a 60% speed reduction near humans and safe, socially-aware navigation and manipulation through semantic-to-physical parameter grounding. This work advances the field of human-centric cybernetics by grounding large-scale cognitive models into predictable, high-frequency physical control loops.
翻译:双臂移动操控需要高层语义推理与安全柔顺物理交互的无缝融合——这一挑战在端到端模型中呈现为不透明过程,而经典控制器又缺乏处理该问题的上下文信息。本文提出GenerativeMPC,一种面向双臂移动操控机器人的分层信息物理框架,显式建立场景语义理解与物理控制参数之间的桥梁。系统采用基于检索增强生成的视觉语言模型(VLM-RAG),将视觉与语言上下文转化为具身约束条件,具体为全身模型预测控制器(MPC)输出动态速度限制与安全边界。同时,VLM-RAG模块调节虚拟刚度与阻尼增益以统一阻抗-导纳控制器,在人机交互过程中实现上下文感知的柔顺行为。本框架利用经验驱动的向量数据库确保参数接地一致性,无需重新训练。在MuJoCo、IsaacSim及实体双臂平台上的实验结果表明,该系统在靠近人体时速度降低60%,并通过语义到物理的参数接地实现安全、具备社会意识的导航与操控。本工作通过将大规模认知模型接地到可预测的高频物理控制回路中,推动了以人为本的赛博学领域发展。