Pre-trained seq2seq models excel at graph semantic parsing with rich annotated data, but generalize worse to out-of-distribution (OOD) and long-tail examples. In comparison, symbolic parsers under-perform on population-level metrics, but exhibit unique strength in OOD and tail generalization. In this work, we study compositionality-aware approach to neural-symbolic inference informed by model confidence, performing fine-grained neural-symbolic reasoning at subgraph level (i.e., nodes and edges) and precisely targeting subgraph components with high uncertainty in the neural parser. As a result, the method combines the distinct strength of the neural and symbolic approaches in capturing different aspects of the graph prediction, leading to well-rounded generalization performance both across domains and in the tail. We empirically investigate the approach in the English Resource Grammar (ERG) parsing problem on a diverse suite of standard in-domain and seven OOD corpora. Our approach leads to 35.26% and 35.60% error reduction in aggregated Smatch score over neural and symbolic approaches respectively, and 14% absolute accuracy gain in key tail linguistic categories over the neural model, outperforming prior state-of-art methods that do not account for compositionality or uncertainty.
翻译:预训练的序列到序列模型在丰富标注数据下的图语义解析中表现出色,但在分布外和长尾样本上的泛化能力较差。相比之下,符号解析器在总体指标上表现欠佳,但在分布外和尾部泛化方面展现出独特优势。本文研究了一种基于模型置信度的组合感知神经-符号推理方法,在子图层面(即节点和边)进行细粒度神经-符号推理,精准定位神经解析器中具有高不确定性的子图组件。该方法融合了神经方法和符号方法在捕捉图预测不同方面的独特优势,从而在跨域和尾部场景中实现均衡的泛化性能。我们在英语资源语法解析问题上进行了实证研究,涵盖一组多样化的标准域内语料库和七个分布外语料库。与纯神经方法和纯符号方法相比,我们的方法在聚合Smatch得分上分别实现了35.26%和35.60%的错误率降低,在关键尾部语言类别上相比神经模型实现了14%的绝对准确率提升,显著优于未考虑组合性或不确定性的现有最优方法。