This paper presents Diver Interest via Pointing in Three Dimensions (DIP-3D), a method to relay an object of interest from a diver to an autonomous underwater vehicle (AUV) by pointing that includes three-dimensional distance information to discriminate between multiple objects in the AUV's camera image. Traditional dense stereo vision for distance estimation underwater is challenging because of the relative lack of saliency of scene features and degraded lighting conditions. Yet, including distance information is necessary for robotic perception of diver pointing when multiple objects appear within the robot's image plane. We subvert the challenges of underwater distance estimation by using sparse reconstruction of keypoints to perform pose estimation on both the left and right images from the robot's stereo camera. Triangulated pose keypoints, along with a classical object detection method, enable DIP-3D to infer the location of an object of interest when multiple objects are in the AUV's field of view. By allowing the scuba diver to point at an arbitrary object of interest and enabling the AUV to autonomously decide which object the diver is pointing to, this method will permit more natural interaction between AUVs and human scuba divers in underwater-human robot collaborative tasks.
翻译:本文提出基于三维指向的潜水员兴趣表达方法(DIP-3D),通过包含三维距离信息的指向动作,将感兴趣目标从潜水员传递至自主水下航行器(AUV),从而在AUV相机图像中区分多个目标。由于水下场景特征显著性相对不足且光照条件退化,传统基于密集立体视觉的水下距离估计面临挑战。然而当机器人图像平面中出现多个目标时,包含距离信息对于机器人感知潜水员指向行为至关重要。我们通过稀疏关键点重建对机器人双目相机左右图像分别进行姿态估计,从而规避水下距离估计的难题。经三角化处理后的姿态关键点与经典目标检测方法相结合,使DIP-3D能够在AUV视野中存在多个目标时推断感兴趣目标的方位。该方法允许潜水员指向任意感兴趣目标,并使AUV自主判断潜水员所指目标,从而在人与AUV的水下协作任务中实现更自然的交互方式。