We present an active mapping system that could plan for long-horizon exploration goals and short-term actions with a 3D Gaussian Splatting (3DGS) representation. Existing methods either did not take advantage of recent developments in multimodal Large Language Models (LLM) or did not consider challenges in localization uncertainty, which is critical in embodied agents. We propose employing multimodal LLMs for long-horizon planning in conjunction with detailed motion planning using our information-based algorithm. By leveraging high-quality view synthesis from our 3DGS representation, our method employs a multimodal LLM as a zero-shot planner for long-horizon exploration goals from the semantic perspective. We also introduce an uncertainty-aware path proposal and selection algorithm that balances the dual objectives of maximizing the information gain for the environment while minimizing the cost of localization errors. Experiments conducted on the Gibson and Habitat-Matterport 3D datasets demonstrate state-of-the-art results of the proposed method.
翻译:我们提出了一种主动建图系统,该系统能够利用3D高斯泼溅(3DGS)表示,规划长期探索目标和短期行动。现有方法要么未能充分利用多模态大语言模型(LLM)的最新进展,要么未考虑定位不确定性的挑战,而这在具身智能体中至关重要。我们提出结合多模态LLM进行长期规划,并采用我们基于信息的算法进行精细运动规划。通过利用3DGS表示所生成的高质量视图合成,我们的方法将多模态LLM作为零样本规划器,从语义角度制定长期探索目标。我们还引入了一种不确定性感知的路径提议与选择算法,该算法在最大化环境信息增益与最小化定位误差成本的双重目标之间取得平衡。在Gibson和Habitat-Matterport 3D数据集上进行的实验表明,所提方法取得了最先进的结果。