Despite advances in depth estimation, flying points remain a persistent failure mode: near object boundaries, depth estimators often predict spurious 3D points in the empty space between foreground and background surfaces. We trace this artifact to a standard modeling choice: assigning each pixel a single depth hypothesis. At boundaries, a pixel can straddle a foreground and a background surface, so its true depth is ambiguous between the two. A model that predicts a single depth cannot keep both possibilities, so training instead pulls the prediction toward an intermediate depth that lies on neither surface. We address this with MDA, a mixture-density representation that lets the model predict multiple depth hypotheses and their associated probabilities for each pixel. Near boundaries, different hypotheses can align with different surfaces, and the decoded depth is selected from one of these hypotheses rather than placed in the empty space between them. Across different backbones, MDA substantially improves boundary reconstruction and largely removes flying-point artifacts even under severe input blur, while adding negligible runtime overhead. The same mixture-density framework naturally extends to transparent objects, where it predicts multiple depth layers at transparent pixels, and to sky regions, where a dedicated component separates the unbounded sky from finite-depth regions, producing flying-point-free skylines. Project Page: https://biansy000.github.io/mda-site/.
翻译:尽管深度估计取得了进展,飞点(flying points)仍然是一个持续存在的失败模式:在物体边界附近,深度估计器常常在前景和背景表面之间的空白空间中预测出虚假的三维点。我们将这一伪影归因于一个标准的建模选择:为每个像素分配单一深度假设。在边界处,一个像素可能同时跨越前景和背景表面,因此其真实深度在这两者之间存在歧义。预测单一深度的模型无法保留两种可能性,因此训练过程反而将预测值拉向一个位于两者之间的中间深度,而这个深度并不位于任一表面上。针对这一问题,我们提出了MDA(混合密度表示法),该表示法允许模型为每个像素预测多个深度假设及其关联概率。在边界附近,不同的假设可以与不同的表面对齐,解码后的深度从这些假设之一中选择,而不是放置在它们之间的空白空间中。在不同的骨干网络下,MDA大幅改善了边界重建质量,并在甚至严重输入模糊的情况下基本消除了飞点伪影,同时仅增加了可忽略不计的运行时间开销。同一混合密度框架自然扩展到透明物体场景(在透明像素处预测多个深度层)以及天空区域(通过专用组件将无界天空与有限深度区域分离,生成无飞点天际线)。项目页面:https://biansy000.github.io/mda-site/。