To complete assignments provided by humans in natural language, robots must interpret commands, generate and answer relevant questions for scene understanding, and manipulate target objects. Real-world deployments often require multiple heterogeneous robots with different manipulation capabilities to handle different assignments cooperatively. Beyond the need for specialized manipulation skills, effective information gathering is important in completing these assignments. To address this component of the problem, we formalize the information-gathering process in a fully cooperative setting as an underexplored multi-agent multi-task Embodied Question Answering (MM-EQA) problem, which is a novel extension of canonical Embodied Question Answering (EQA), where effective communication is crucial for coordinating efforts without redundancy. To address this problem, we propose CommCP, a novel LLM-based decentralized communication framework designed for MM-EQA. Our framework employs conformal prediction to calibrate the generated messages, thereby minimizing receiver distractions and enhancing communication reliability. To evaluate our framework, we introduce an MM-EQA benchmark featuring diverse, photo-realistic household scenarios with embodied questions. Experimental results demonstrate that CommCP significantly enhances the task success rate and exploration efficiency over baselines. The experiment videos, code, and dataset are available on our project website: https://comm-cp.github.io.
翻译:为完成人类以自然语言下达的任务,机器人需解析指令、生成并回答相关问题以实现场景理解,并操控目标物体。实际部署中常需多个具备不同操控能力的异构机器人协同处理多样化任务。除需专业操控技能外,高效的信息收集对任务完成至关重要。针对该问题,我们将完全协作环境下的信息收集过程形式化为一个尚未充分探索的多智能体多任务具身问答问题,这是经典具身问答任务的新扩展,其中有效通信对避免冗余的协同作业至关重要。为此,我们提出CommCP——一种专为MM-EQA设计的新型基于大语言模型的去中心化通信框架。该框架采用保形预测技术对生成信息进行校准,从而最小化接收方干扰并提升通信可靠性。为评估框架性能,我们构建了包含多样化、照片级真实家庭场景与具身问题的MM-EQA基准测试集。实验结果表明,CommCP在任务成功率和探索效率上均显著超越基线方法。实验视频、代码及数据集详见项目网站:https://comm-cp.github.io。