Bin-picking of metal objects using low-cost RGB-D cameras often suffers from sparse depth information and reflective surface textures, leading to errors and the need for manual labeling. To reduce human intervention, we propose a two-stage framework consisting of a metric learning stage and a self-training stage. Specifically, to automatically process data captured by a low-cost camera (LC), we introduce a Multi-object Pose Reasoning (MoPR) algorithm that optimizes pose hypotheses under depth, collision, and boundary constraints. To further refine pose candidates, we adopt a Symmetry-aware Lie-group based Bayesian Gaussian Mixture Model (SaL-BGMM), integrated with the Expectation-Maximization (EM) algorithm, for symmetry-aware filtering. Additionally, we propose a Weighted Ranking Information Noise Contrastive Estimation (WR-InfoNCE) loss to enable the LC to learn a perceptual metric from reconstructed data, supporting self-training on untrained or even unseen objects. Experimental results show that our approach outperforms several state-of-the-art methods on both the ROBI dataset and our newly introduced Self-ROBI dataset.
翻译:使用低成本RGB-D相机进行金属物体的箱内拾取常因深度信息稀疏和反射表面纹理而遭受误差,并需要人工标注。为减少人工干预,我们提出了一个包含度量学习阶段和自训练阶段的两阶段框架。具体而言,为自动处理低成本相机(LC)捕获的数据,我们引入了一种多物体姿态推理(MoPR)算法,该算法在深度、碰撞和边界约束下优化姿态假设。为进一步精炼姿态候选,我们采用了一种基于对称性感知李群的贝叶斯高斯混合模型(SaL-BGMM),并与期望最大化(EM)算法集成,用于对称性感知滤波。此外,我们提出了一种加权排序信息噪声对比估计(WR-InfoNCE)损失,使LC能够从重建数据中学习感知度量,支持对未训练甚至未见过的物体进行自训练。实验结果表明,我们的方法在ROBI数据集和新引入的Self-ROBI数据集上均优于多种先进方法。