Transparent objects are common in daily life, and it is important to understand their multilayer depth, including the transparent surface and the objects behind it. Existing methods for multilayer depth typically extend single-layer prediction. They define layers by the front-to-back ordering of 3D points and predict the layers sequentially. However, as layered geometry can admit multiple valid groupings of 3D points into layers, a predefined grouping strategy is inherently restrictive. In this work, we propose SeeGroup, a multi-layer depth estimation method that avoids imposing a predefined grouping and allows the model itself to adaptively assign surfaces to depth maps. We formulate per-pixel multi-layer depth as a point process, treating depth layers as unordered events along each camera ray. This induces a permutation-invariant likelihood over the observed depth layers, yielding a loss that naturally supports arbitrary layer groupings. Experiments demonstrate that our method significantly advances the state of the art of multi-layer depth estimation, improving quadruplet relative depth accuracy on LayeredDepth benchmark from 61.34% to 70.09%. Code is available at https://github.com/princeton-vl/SeeGroup.
翻译:透明物体在日常生活中十分常见,理解其多层深度(包括透明表面及其后方物体)至关重要。现有的多层深度估计方法通常延续单层预测的思路,通过定义三维点的前后顺序来划分层级并依次预测各层深度。然而,由于分层几何结构允许三维点存在多种合理的层级划分方式,预设的分组策略本质上具有局限性。为此,我们提出SeeGroup方法,该方法无需预设分组,允许模型自适应地将表面分配到不同的深度图中。我们将逐像素多层深度建模为点过程,将深度层级视为沿每条相机射线发生的无序事件,由此推导出观测深度层上的置换不变似然函数,该似然对应的损失函数天然支持任意形式的层级分组。实验表明,我们的方法显著推动了多层深度估计领域的最新技术水平,在LayeredDepth基准上将四元组相对深度准确率从61.34%提升至70.09%。代码已开源:https://github.com/princeton-vl/SeeGroup