In the rapidly advancing field of robotics, the fusion of state-of-the-art visual technologies with mobile robotic arms has emerged as a critical integration. This paper introduces a novel system that combines the Segment Anything model (SAM) -- a transformer-based visual foundation model -- with a robotic arm on a mobile platform. The design of integrating a depth camera on the robotic arm's end-effector ensures continuous object tracking, significantly mitigating environmental uncertainties. By deploying on a mobile platform, our grasping system has an enhanced mobility, playing a key role in dynamic environments where adaptability are critical. This synthesis enables dynamic object segmentation, tracking, and grasping. It also elevates user interaction, allowing the robot to intuitively respond to various modalities such as clicks, drawings, or voice commands, beyond traditional robotic systems. Empirical assessments in both simulated and real-world demonstrate the system's capabilities. This configuration opens avenues for wide-ranging applications, from industrial settings, agriculture, and household tasks, to specialized assignments and beyond.
翻译:在机器人技术快速发展的领域中,将最先进的视觉技术与移动机械臂融合已成为一项关键集成。本文介绍了一种新颖系统,它将基于变换器的视觉基础模型——分割一切模型(SAM)——与移动平台上的机械臂相结合。在机械臂末端执行器上集成深度摄像头的设计确保了连续目标跟踪,显著减轻了环境不确定性。通过部署在移动平台上,我们的抓取系统具备了增强的移动性,在适应性至关重要的动态环境中发挥着关键作用。这种综合实现了动态目标分割、跟踪和抓取。它还提升了用户交互,使机器人能够直观地响应点击、绘图或语音命令等多种模态,超越了传统机器人系统。在模拟和真实环境中的实证评估展示了该系统的能力。这种配置为广泛应用开辟了道路,涵盖工业环境、农业、家务任务以及专业作业等领域。