ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation

In medical and industrial domains, providing guidance for assembly processes is critical to ensure efficiency and safety. Errors in assembly can lead to significant consequences such as extended surgery times, and prolonged manufacturing or maintenance times in industry. Assembly scenarios can benefit from in-situ AR visualization to provide guidance, reduce assembly times and minimize errors. To enable in-situ visualization 6D pose estimation can be leveraged. Existing 6D pose estimation techniques primarily focus on individual objects and static captures. However, assembly scenarios have various dynamics including occlusion during assembly and dynamics in the assembly objects appearance. Existing work, combining object detection/6D pose estimation and assembly state detection focuses either on pure deep learning-based approaches, or limit the assembly state detection to building blocks. To address the challenges of 6D pose estimation in combination with assembly state detection, our approach ASDF builds upon the strengths of YOLOv8, a real-time capable object detection framework. We extend this framework, refine the object pose and fuse pose knowledge with network-detected pose information. Utilizing our late fusion in our Pose2State module results in refined 6D pose estimation and assembly state detection. By combining both pose and state information, our Pose2State module predicts the final assembly state with precision. Our evaluation on our ASDF dataset shows that our Pose2State module leads to an improved assembly state detection and that the improvement of the assembly state further leads to a more robust 6D pose estimation. Moreover, on the GBOT dataset, we outperform the pure deep learning-based network, and even outperform the hybrid and pure tracking-based approaches.

翻译：在医疗和工业领域，为装配过程提供指导对确保效率和安全性至关重要。装配错误可能导致严重后果，例如延长手术时间，或增加制造与维护周期。装配场景可借助原位增强现实可视化提供指导，从而缩短装配时间并减少错误。为实现原位可视化，可利用6D姿态估计技术。现有6D姿态估计方法主要关注单个物体和静态场景，而装配场景存在多种动态特性，包括装配过程中的遮挡以及装配对象外观的变化。现有工作将物体检测/6D姿态估计与装配状态检测相结合，要么完全基于深度学习方法，要么将装配状态检测局限于基础构件。为解决6D姿态估计与装配状态检测结合所面临的挑战，我们的ASDF方法基于实时物体检测框架YOLOv8的优势。我们扩展该框架，优化物体姿态，并将姿态知识与网络检测到的姿态信息相融合。通过在我们的Pose2State模块中应用后期融合，实现了优化的6D姿态估计和装配状态检测。通过结合姿态与状态信息，Pose2State模块能够精确预测最终装配状态。在ASDF数据集上的评估表明，我们的Pose2State模块提升了装配状态检测效果，且装配状态的改进进一步增强了6D姿态估计的鲁棒性。此外，在GBOT数据集上，我们不仅超越了纯深度学习方法，还优于混合方法和纯基于跟踪的方法。