This paper introduces a deep transformer network for estimating the relative 6D pose of a Unmanned Aerial Vehicle (UAV) with respect to a ship using monocular images. A synthetic dataset of ship images is created and annotated with 2D keypoints of multiple ship parts. A Transformer Neural Network model is trained to detect these keypoints and estimate the 6D pose of each part. The estimates are integrated using Bayesian fusion. The model is tested on synthetic data and in-situ flight experiments, demonstrating robustness and accuracy in various lighting conditions. The position estimation error is approximately 0.8\% and 1.0\% of the distance to the ship for the synthetic data and the flight experiments, respectively. The method has potential applications for ship-based autonomous UAV landing and navigation.
翻译:本文提出一种基于深度Transformer网络的模型,用于通过单目图像估计无人机相对于船舶的相对6D姿态。研究创建了一个船舶图像合成数据集,并对多个船舶部件的二维关键点进行了标注。通过训练Transformer神经网络模型来检测这些关键点并估计每个部件的6D姿态。利用贝叶斯融合方法对估计结果进行集成。该模型在合成数据和实地飞行实验中进行了测试,在不同光照条件下均表现出鲁棒性和精确性。对于合成数据和飞行实验,位置估计误差分别约为到船舶距离的0.8%和1.0%。该方法在船舶自主无人机着陆与导航领域具有潜在应用价值。