Current high-performance semantic segmentation models are purely data-driven sub-symbolic approaches and blind to the structured nature of the visual world. This is in stark contrast to human cognition which abstracts visual perceptions at multiple levels and conducts symbolic reasoning with such structured abstraction. To fill these fundamental gaps, we devise LOGICSEG, a holistic visual semantic parser that integrates neural inductive learning and logic reasoning with both rich data and symbolic knowledge. In particular, the semantic concepts of interest are structured as a hierarchy, from which a set of constraints are derived for describing the symbolic relations and formalized as first-order logic rules. After fuzzy logic-based continuous relaxation, logical formulae are grounded onto data and neural computational graphs, hence enabling logic-induced network training. During inference, logical constraints are packaged into an iterative process and injected into the network in a form of several matrix multiplications, so as to achieve hierarchy-coherent prediction with logic reasoning. These designs together make LOGICSEG a general and compact neural-logic machine that is readily integrated into existing segmentation models. Extensive experiments over four datasets with various segmentation models and backbones verify the effectiveness and generality of LOGICSEG. We believe this study opens a new avenue for visual semantic parsing.
翻译:当前高性能语义分割模型均为纯数据驱动的亚符号方法,对视觉世界的结构化本质缺乏认知。这与人类认知形成鲜明对比——人类通过多层抽象提取视觉感知,并基于结构化抽象进行符号推理。为弥补这一根本性缺口,我们设计了LOGICSEG这一整体性视觉语义解析器,将神经归纳学习与逻辑推理相结合,同时利用丰富数据和符号知识。具体而言,将关注的语义概念构建为层次结构,从中推导出一组用于描述符号关系的约束,并将其形式化为一阶逻辑规则。通过基于模糊逻辑的连续松弛处理,将逻辑公式锚定到数据与神经计算图上,从而实现逻辑驱动的网络训练。在推理阶段,逻辑约束被封装为迭代过程,以若干矩阵乘法的形式注入网络,从而通过逻辑推理实现层次一致的预测。这些设计共同使LOGICSEG成为一个普适且紧凑的神经-逻辑机器,可便捷地集成到现有分割模型中。在四个数据集上基于多种分割模型与骨干网络的广泛实验验证了LOGICSEG的有效性与通用性。我们认为,本研究为视觉语义解析开辟了新路径。