Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce Model

In many real-world applications, the relative depth of objects in an image is crucial for scene understanding. Recent approaches mainly tackle the problem of depth prediction in monocular images by treating the problem as a regression task. Yet, being interested in an order relation in the first place, ranking methods suggest themselves as a natural alternative to regression, and indeed, ranking approaches leveraging pairwise comparisons as training information ("object A is closer to the camera than B") have shown promising performance on this problem. In this paper, we elaborate on the use of so-called listwise ranking as a generalization of the pairwise approach. Our method is based on the Plackett-Luce (PL) model, a probability distribution on rankings, which we combine with a state-of-the-art neural network architecture and a simple sampling strategy to reduce training complexity. Moreover, taking advantage of the representation of PL as a random utility model, the proposed predictor offers a natural way to recover (shift-invariant) metric depth information from ranking-only data provided at training time. An empirical evaluation on several benchmark datasets in a "zero-shot" setting demonstrates the effectiveness of our approach compared to existing ranking and regression methods.

翻译：在许多现实应用中,图像中物体的相对深度对于实地理解至关重要。最近的方法主要通过将问题作为回归任务来处理单视图像的深度预测问题。然而,由于首先对顺序关系感兴趣,排名方法表明自己是回归的自然替代物,而且事实上,排序方法利用对等比较作为培训信息(“目标A比相机更接近于B”),显示了在这一问题上的有希望的表现。在本文中,我们详细阐述了使用所谓的列表排序作为对等方法的概括化。我们的方法基于Plackett-Luce(PL)模型,即排名概率分布,我们结合了最新神经网络结构和简单的抽样战略来降低培训复杂性。此外,利用PL作为随机工具模型的表述,拟议的预测器为从培训时提供的排名数据中恢复(易变)基准深度信息提供了一种自然的方法。在“零位”设置中,对若干基准数据集进行了实证评估,展示了我们方法相对于现有排序和回归方法的有效性。