面向杂乱场景中物体感知三维高斯溅射的信息化物体中心化最优视点选择 (Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes)

In cluttered scenes with inevitable occlusions and incomplete observations, selecting informative viewpoints is essential for building a reliable representation. In this context, 3D Gaussian Splatting (3DGS) offers a distinct advantage, as it can explicitly guide the selection of subsequent viewpoints and then refine the representation with new observations. However, existing approaches rely solely on geometric cues, neglect manipulation-relevant semantics, and tend to prioritize exploitation over exploration. To tackle these limitations, we introduce an instance-aware Next Best View (NBV) policy that prioritizes underexplored regions by leveraging object features. Specifically, our object-aware 3DGS distills instancelevel information into one-hot object vectors, which are used to compute confidence-weighted information gain that guides the identification of regions associated with erroneous and uncertain Gaussians. Furthermore, our method can be easily adapted to an object-centric NBV, which focuses view selection on a target object, thereby improving reconstruction robustness to object placement. Experiments demonstrate that our NBV policy reduces depth error by up to 77.14% on the synthetic dataset and 34.10% on the real-world GraspNet dataset compared to baselines. Moreover, compared to targeting the entire scene, performing NBV on a specific object yields an additional reduction of 25.60% in depth error for that object. We further validate the effectiveness of our approach through real-world robotic manipulation tasks.

翻译：在存在不可避免遮挡与不完整观测的杂乱场景中，选择信息丰富的视点对于构建可靠表征至关重要。在此背景下，三维高斯溅射（3DGS）展现出独特优势，因其能够显式指导后续视点的选择，并利用新观测优化表征。然而，现有方法仅依赖几何线索，忽略与操作相关的语义信息，且往往偏重利用而非探索。为克服这些局限，我们提出一种实例感知的最优视点（NBV）策略，通过利用物体特征优先探索未充分观测区域。具体而言，我们的物体感知3DGS将实例级信息蒸馏为独热编码的物体向量，用于计算置信度加权的信息增益，从而指导识别与错误及不确定高斯分布相关的区域。此外，本方法可轻松适配为以物体为中心的NBV策略，将视点选择聚焦于目标物体，从而提升对物体位置放置的重建鲁棒性。实验表明：在合成数据集上，相比基线方法，我们的NBV策略将深度误差降低达77.14%；在真实世界GraspNet数据集上降低达34.10%。相较于针对整个场景执行NBV，对特定物体执行NBV可额外降低该物体25.60%的深度误差。我们进一步通过真实世界机器人操作任务验证了本方法的有效性。