Fringe Projection Based Vision Pipeline for Autonomous Hard Drive Disassembly

Unrecovered e-waste represents a significant economic loss. Hard disk drives (HDDs) comprise a valuable e-waste stream necessitating robotic disassembly. Automating the disassembly of HDDs requires holistic 3D sensing, scene understanding, and fastener localization, however current methods are fragmented, lack robust 3D sensing, and lack fastener localization. We propose an autonomous vision pipeline which performs 3D sensing using a Fringe Projection Profilometry (FPP) module, with selective triggering of a depth completion module where FPP fails, and integrates this module with a lightweight, real-time instance segmentation network for scene understanding and critical component localization. By utilizing the same FPP camera-projector system for both our depth sensing and component localization modules, our depth maps and derived 3D geometry are inherently pixel-wise aligned with the segmentation masks without registration, providing an advantage over RGB-D perception systems common in industrial sensing. We optimize both our trained depth completion and instance segmentation networks for deployment-oriented inference. The proposed system achieves a box mAP@50 of 0.960 and mask mAP@50 of 0.957 for instance segmentation, while the selected depth completion configuration with the Depth Anything V2 Base backbone achieves an RMSE of 2.317 mm and MAE of 1.836 mm; the Platter Facing learned inference stack achieved a combined latency of 12.86 ms and a throughput of 77.7 Frames Per Second (FPS) on the evaluation workstation. Finally, we adopt a sim-to-real transfer learning approach to augment our physical dataset. The proposed perception pipeline provides both high-fidelity semantic and spatial data which can be valuable for downstream robotic disassembly. The synthetic dataset developed for HDD instance segmentation will be made publicly available.

翻译：未回收的电子废弃物造成重大经济损失。硬盘驱动器（HDD）构成高价值的电子废弃物流，亟需实现机器人自主拆解。硬盘自动化拆解需要完整的三维感知、场景理解与紧固件定位，然而现有方法存在碎片化、缺乏鲁棒三维感知及紧固件定位能力等问题。本文提出一种自主视觉流水线，采用条纹投影轮廓术（FPP）模块实现三维感知，并在FPP失效时选择性触发深度补全模块，同时将该模块与轻量级实时实例分割网络集成，用于场景理解与关键部件定位。通过将同一FPP相机-投影系统同时用于深度感知与部件定位模块，本系统的深度图及其衍生的三维几何信息能天然与分割掩码实现逐像素对齐（无需配准），相较于工业感知中常用的RGB-D感知系统具有显著优势。我们对训练后的深度补全与实例分割网络进行部署推理优化。所提系统的实例分割框mAP@50达0.960、掩码mAP@50达0.957，采用Depth Anything V2基础骨干网络的深度补全配置实现RMSE为2.317 mm、MAE为1.836 mm；Platter Facing学习推理栈在评估工作站上实现12.86 ms的组合延迟与77.7 FPS的吞吐量。最后，我们采用仿真到现实（sim-to-real）迁移学习方法增强物理数据集。该感知流水线能够提供高保真语义与空间数据，对下游机器人拆解任务具有重要价值。面向HDD实例分割开发的合成数据集将公开发布。