Single-view depth estimation refers to the ability to derive three-dimensional information per pixel from a single two-dimensional image. Single-view depth estimation is an ill-posed problem because there are multiple depth solutions that explain 3D geometry from a single view. While deep neural networks have been shown to be effective at capturing depth from a single view, the majority of current methodologies are deterministic in nature. Accounting for uncertainty in the predictions can avoid disastrous consequences when applied to fields such as autonomous driving or medical robotics. We have addressed this problem by quantifying the uncertainty of supervised single-view depth for Bayesian deep neural networks. There are scenarios, especially in medicine in the case of endoscopic images, where such annotated data is not available. To alleviate the lack of data, we present a method that improves the transition from synthetic to real domain methods. We introduce an uncertainty-aware teacher-student architecture that is trained in a self-supervised manner, taking into account the teacher uncertainty. Given the vast amount of unannotated data and the challenges associated with capturing annotated depth in medical minimally invasive procedures, we advocate a fully self-supervised approach that only requires RGB images and the geometric and photometric calibration of the endoscope. In endoscopic imaging, the camera and light sources are co-located at a small distance from the target surfaces. This setup indicates that brighter areas of the image are nearer to the camera, while darker areas are further away. Building on this observation, we exploit the fact that for any given albedo and surface orientation, pixel brightness is inversely proportional to the square of the distance. We propose the use of illumination as a strong single-view self-supervisory signal for deep neural networks.
翻译:单视角深度估计是指从单张二维图像中逐像素推导三维信息的能力。单视角深度估计是一个不适定问题,因为从单一视角解释三维几何结构存在多种深度解。尽管深度神经网络已被证明能有效从单视角捕捉深度信息,但当前大多数方法本质上是确定性的。在自动驾驶或医疗机器人等领域应用时,考虑预测中的不确定性可以避免灾难性后果。我们通过量化贝叶斯深度神经网络在监督式单视角深度估计中的不确定性来解决该问题。在某些场景下(特别是医学领域的内窥镜图像),此类标注数据难以获取。为缓解数据匮乏问题,我们提出了一种改进从合成域到真实域迁移的方法。我们引入了一种不确定性感知的师生架构,该架构以自监督方式进行训练,并充分考虑教师模型的不确定性。鉴于未标注数据量巨大且医疗微创手术中获取标注深度面临诸多挑战,我们倡导一种完全自监督的方法,该方法仅需RGB图像及内窥镜的几何与光度校准参数。在内窥镜成像中,相机与光源共置于距离目标表面较近的位置。这种设置意味着图像中较亮的区域更靠近相机,而较暗的区域则距离更远。基于此观察,我们利用以下事实:对于任意给定的反照率和表面方向,像素亮度与距离平方成反比。我们提出将光照信息作为深度神经网络的强单视角自监督信号。