Estimating depth from images nowadays yields outstanding results, both in terms of in-domain accuracy and generalization. However, we identify two main challenges that remain open in this field: dealing with non-Lambertian materials and effectively processing high-resolution images. Purposely, we propose a novel dataset that includes accurate and dense ground-truth labels at high resolution, featuring scenes containing several specular and transparent surfaces. Our acquisition pipeline leverages a novel deep space-time stereo framework, enabling easy and accurate labeling with sub-pixel precision. The dataset is composed of 606 samples collected in 85 different scenes, each sample includes both a high-resolution pair (12 Mpx) as well as an unbalanced stereo pair (Left: 12 Mpx, Right: 1.1 Mpx), typical of modern mobile devices that mount sensors with different resolutions. Additionally, we provide manually annotated material segmentation masks and 15K unlabeled samples. The dataset is composed of a train set and two test sets, the latter devoted to the evaluation of stereo and monocular depth estimation networks. Our experiments highlight the open challenges and future research directions in this field.
翻译:当前,从图像估计深度的方法在领域内精度和泛化能力上均取得了显著成果。然而,我们识别出该领域仍待解决的两大核心挑战:处理非朗伯体材质及有效处理高分辨率图像。为此,我们提出一个包含高精度密集真实标签(高分辨率)的新型数据集,其场景涵盖大量镜面与透明表面。我们的采集流程采用新颖的深度时空立体框架,能以亚像素精度实现便捷准确的标注。该数据集由85个不同场景中的606个样本组成,每个样本同时包含高分辨率立体对(12 Mpx)及非平衡立体对(左:12 Mpx,右:1.1 Mpx),后者符合现代移动设备常配备不同分辨率传感器的典型特征。此外,我们还提供人工标注的材质分割掩码及15K未标注样本。数据集包含一个训练集与两个测试集,后者专用于评估立体与单目深度估计网络。我们的实验揭示了该领域中的开放挑战与未来研究方向。