Open-vocabulary scene graph generation (SGG) aims to describe visual scenes with flexible relation phrases beyond a fixed predicate set. Existing methods usually treat annotated triplets as positives and all unannotated object-pair relations as negatives. However, scene graph annotations are inherently incomplete: many valid relations are missing, and the same interaction can be described at different granularities, e.g., \textit{on}, \textit{standing on}, \textit{resting on}, and \textit{supported by}. This issue becomes more severe in open-vocabulary SGG due to the much larger relation space. We propose \textbf{ReLIC-SGG}, a relation-incompleteness-aware framework that treats unannotated relations as latent variables rather than definite negatives. ReLIC-SGG builds a semantic relation lattice to model similarity, entailment, and contradiction among open-vocabulary predicates, and uses it to infer missing positive relations from visual-language compatibility, graph context, and semantic consistency. A positive-unlabeled graph learning objective further reduces false-negative supervision, while lattice-guided decoding produces compact and semantically consistent scene graphs. Experiments on conventional, open-vocabulary, and panoptic SGG benchmarks show that ReLIC-SGG improves rare and unseen predicate recognition and better recovers missing relations.
翻译:开放词汇场景图生成(Open-vocabulary SGG)旨在超越固定谓词集合,使用灵活的关系短语描述视觉场景。现有方法通常将标注的三元组视为正样本,而将所有未标注的对象对关系视为负样本。然而,场景图标注本质上具有不完整性:许多有效关系缺失,且同一交互行为可用不同粒度描述,例如\textit{在...上}、\textit{站立在...上}、\textit{停放在...上}和\textit{由...支撑}。由于开放词汇场景图生成的关系空间显著扩大,此问题更为严峻。我们提出\textbf{ReLIC-SGG}——一种感知关系不完整性的框架,将未标注关系视为潜在变量而非明确负样本。ReLIC-SGG构建语义关系格以建模开放词汇谓词间的相似性、蕴含与矛盾关系,并利用该结构从视觉-语言兼容性、图上下文及语义一致性中推断缺失的正关系。正-无标签图学习目标进一步减少假阴性监督,而格引导的解码过程生成紧凑且语义一致的场景图。在传统、开放词汇及全景场景图生成基准上的实验表明,ReLIC-SGG显著提升了罕见与未见谓词的识别能力,并更有效地恢复了缺失关系。