There is a significant disconnect between linguistic theory and modern NLP practice, which relies heavily on inscrutable black-box architectures. DisCoCirc is a newly proposed model for meaning that aims to bridge this divide, by providing neuro-symbolic models that incorporate linguistic structure. DisCoCirc represents natural language text as a `circuit' that captures the core semantic information of the text. These circuits can then be interpreted as modular machine learning models. Additionally, DisCoCirc fulfils another major aim of providing an NLP model that can be implemented on near-term quantum computers. In this paper we describe a software pipeline that converts English text to its DisCoCirc representation. The pipeline achieves coverage over a large fragment of the English language. It relies on Combinatory Categorial Grammar (CCG) parses of the input text as well as coreference resolution information. This semantic and syntactic information is used in several steps to convert the text into a simply-typed $\lambda$-calculus term, and then into a circuit diagram. This pipeline will enable the application of the DisCoCirc framework to NLP tasks, using both classical and quantum approaches.
翻译:语言理论与现代NLP实践之间存在显著脱节,后者高度依赖难以解读的黑箱架构。DisCoCirc是一种新提出的意义表示模型,旨在通过融合语言结构的神经符号模型来弥合这一鸿沟。DisCoCirc将自然语言文本表示为捕捉核心语义信息的"电路",这些电路可被解释为模块化机器学习模型。此外,DisCoCirc还实现了另一个重要目标:提供可在近期量子计算机上实现的NLP模型。本文描述了一个将英文文本转换为DisCoCirc表示的软件流程。该流程覆盖了英语的大部分语言片段,依赖于输入文本的组合范畴语法(CCG)解析结果以及共指消解信息。这些语义与句法信息经过多步处理,将文本转换为简单类型λ-演算项,继而转化为电路图。该流程将推动DisCoCirc框架在NLP任务中的应用,支持经典计算与量子计算两种途径。