Depth estimation plays an important role in the robotic perception system. Self-supervised monocular paradigm has gained significant attention since it can free training from the reliance on depth annotations. Despite recent advancements, existing self-supervised methods still underutilize the available training data, limiting their generalization ability. In this paper, we take two data augmentation techniques, namely Resizing-Cropping and Splitting-Permuting, to fully exploit the potential of training datasets. Specifically, the original image and the generated two augmented images are fed into the training pipeline simultaneously and we leverage them to conduct self-distillation. Additionally, we introduce the detail-enhanced DepthNet with an extra full-scale branch in the encoder and a grid decoder to enhance the restoration of fine details in depth maps. Experimental results demonstrate our method can achieve state-of-the-art performance on the KITTI benchmark, with both raw ground truth and improved ground truth. Moreover, our models also show superior generalization performance when transferring to Make3D and NYUv2 datasets. Our codes are available at https://github.com/Sauf4896/BDEdepth.
翻译:深度估计在机器人感知系统中扮演着重要角色。自监督单目范式因其无需依赖深度标注即可进行训练而备受关注。尽管近年取得进展,现有自监督方法仍未能充分利用可用训练数据,限制了其泛化能力。本文采用两种数据增强技术——调整裁剪(Resizing-Cropping)与分割置换(Splitting-Permuting),以充分挖掘训练数据集的潜力。具体而言,原始图像与生成的两张增强图像被同时输入训练流程,并利用它们进行自蒸馏。此外,我们引入细节增强型DepthNet,其编码器配备额外全尺度分支,并采用网格解码器以提升深度图中精细细节的恢复效果。实验结果表明,本方法在KITTI基准上,使用原始真实标注和改进真实标注均能达到最先进性能。此外,在迁移至Make3D和NYUv2数据集时,我们的模型也展现出优越的泛化能力。代码已开源在 https://github.com/Sauf4896/BDEdepth。