Recognizing unknown objects is crucial for safety-critical applications such as autonomous driving and robotics. Open-Set Panoptic Segmentation (OPS) aims to segment known thing and stuff classes while identifying valid unknown objects as separate instances. Prior OPS approaches largely treat known categories as a flat label set, ignoring the semantic hierarchy that provides valuable structural priors for distinguishing unknown objects from in-distribution classes. In this work, we propose Hyp2Former, an end-to-end framework for OPS that does not require explicit modeling of unknowns during training, and instead learns hierarchical semantic similarities continuously in hyperbolic space. By explicitly encoding hierarchical relationships among known categories, the model learns a structured embedding space that captures multiple levels of semantic abstraction. As a result, unknown objects that cannot be confidently classified as known categories still remain in close proximity to higher-level concepts (e.g., an unknown animal remains closer to "animal" or "object" than to unrelated concepts such as "electronics" or "stuff") and can therefore be reliably detected, even if their fine-grained category was not represented during training. Empirical evaluations across multiple public datasets such as MS COCO, Cityscapes, and Lost&Found demonstrate that Hyp2Former outperforms existing methods on OPS, achieving the best balance between unknown object discovery and in-distribution robustness.
翻译:识别未知对象对于自动驾驶和机器人等安全关键型应用至关重要。开放集全景分割(OPS)旨在分割已知物体与材质类别,同时将有效未知对象作为独立实例进行识别。现有OPS方法多将已知类别视为扁平标签集,忽略了能提供区分未知对象与分布内类别所需结构化先验的语义层级。本研究提出Hyp2Former——一种端到端的OPS框架,该框架无需在训练阶段显式建模未知对象,而是连续在双曲空间中学习层级化语义相似性。通过显式编码已知类别间的层级关系,模型可构建捕获多层级语义抽象的结构化嵌入空间。因此,即使未知对象无法被自信地归类为已知类别,其仍能与高层级概念保持高度接近(例如未知动物比"电子产品"或"材质"等无关概念更接近"动物"或"物体"),从而在细粒度类别未受训练的情况下实现可靠检测。在MS COCO、Cityscapes及Lost&Found等多个公共数据集上的实证评估表明,Hyp2Former在OPS任务上优于现有方法,实现了未知对象发现与分布内鲁棒性之间的最佳平衡。