Depth completion and object detection are two crucial tasks often used for aerial 3D mapping, path planning, and collision avoidance of Uncrewed Aerial Vehicles (UAVs). Common solutions include using measurements from a LiDAR sensor; however, the generated point cloud is often sparse and irregular and limits the system's capabilities in 3D rendering and safety-critical decision-making. To mitigate this challenge, information from other sensors on the UAV (viz., a camera used for object detection) is utilized to help the depth completion process generate denser 3D models. Performing both aerial depth completion and object detection tasks while fusing the data from the two sensors poses a challenge to resource efficiency. We address this challenge by proposing a novel approach to jointly execute the two tasks in a single pass. The proposed method is based on an encoder-focused multi-task learning model that exposes the two tasks to jointly learned features. We demonstrate how semantic expectations of the objects in the scene learned by the object detection pathway can boost the performance of the depth completion pathway while placing the missing depth values. Experimental results show that the proposed multi-task network outperforms its single-task counterpart, particularly when exposed to defective inputs.
翻译:深度补全与目标检测是无人机(UAV)进行航空三维建图、路径规划及碰撞规避时常用的两项关键任务。常规解决方案依赖激光雷达(LiDAR)传感器的测量数据,但生成的点云往往稀疏且不规则,限制了系统在三维渲染及安全关键决策方面的能力。为缓解该挑战,我们利用无人机上其他传感器(即用于目标检测的摄像头)的信息辅助深度补全过程生成更密集的三维模型。同时执行航空深度补全与目标检测两项任务并融合两个传感器的数据,对资源效率构成了挑战。为解决此问题,我们提出了一种新颖的单次联合执行两任务的方法。该方法基于编码器聚焦型多任务学习模型,使两个任务共享联合学习的特征。我们展示了场景中目标的语义预期(由目标检测通路学习得到)如何在填补缺失深度值时提升深度补全通路的性能。实验结果表明,所提多任务网络优于单任务对应模型,尤其在处理缺陷输入时表现突出。