Depth completion is a long-standing challenge in computer vision, where classification-based methods have made tremendous progress in recent years. However, most existing classification-based methods rely on pre-defined pixel-shared and discrete depth values as depth categories. This representation fails to capture the continuous depth values that conform to the real depth distribution, leading to depth smearing in boundary regions. To address this issue, we revisit depth completion from the clustering perspective and propose a novel clustering-based framework called CluDe which focuses on learning the pixel-wise and continuous depth representation. The key idea of CluDe is to iteratively update the pixel-shared and discrete depth representation to its corresponding pixel-wise and continuous counterpart, driven by the real depth distribution. Specifically, CluDe first utilizes depth value clustering to learn a set of depth centers as the depth representation. While these depth centers are pixel-shared and discrete, they are more in line with the real depth distribution compared to pre-defined depth categories. Then, CluDe estimates offsets for these depth centers, enabling their dynamic adjustment along the depth axis of the depth distribution to generate the pixel-wise and continuous depth representation. Extensive experiments demonstrate that CluDe successfully reduces depth smearing around object boundaries by utilizing pixel-wise and continuous depth representation. Furthermore, CluDe achieves state-of-the-art performance on the VOID datasets and outperforms classification-based methods on the KITTI dataset.
翻译:深度补全是计算机视觉中一项长期存在的挑战,近年来基于分类的方法在该领域取得了显著进展。然而,现有大多数分类方法依赖于预定义的像素共享且离散的深度值作为深度类别,这种表示方式无法捕捉符合真实深度分布的连续深度值,导致边界区域出现深度模糊。为解决这一问题,我们从聚类视角重新审视深度补全任务,提出一种名为CluDe的新型聚类框架,其核心在于学习像素级且连续的深度表示。CluDe的关键思想是基于真实深度分布驱动,将像素共享的离散深度表示迭代更新为对应的像素级连续表示。具体而言,CluDe首先通过深度值聚类学习一组深度中心作为深度表示,这些深度中心虽然仍为像素共享且离散的,但相较于预定义深度类别更符合真实深度分布;随后,CluDe估算这些深度中心的偏移量,使其沿深度分布的深度轴动态调整,从而生成像素级连续深度表示。大量实验表明,CluDe通过利用像素级连续深度表示成功减少了物体边界周围的深度模糊现象。此外,CluDe在VOID数据集上达到了最先进的性能,并在KITTI数据集上优于基于分类的方法。