This research introduces a new parsing approach, based on earlier syntactic work on context free grammar (CFG) and generalized phrase structure grammar (GPSG). The approach comprises both a new parsing algorithm and a set of syntactic rules and features that overcome the limitations of CFG. It also generates both dependency and constituency parse trees, while accommodating noise and incomplete parses. The system was tested on data from Universal Dependencies, showing a promising average Unlabeled Attachment Score (UAS) of 54.5% in the development dataset (7 corpora) and 53.8% in the test set (12 corpora). The system also provides multiple parse hypotheses, allowing further reranking to improve parsing accuracy. This approach also leverages much of the theoretical syntactic work since the 1950s to be used within a computational context. The application of this approach provides a transparent and interpretable NLP model to process language input.
翻译:本研究提出了一种新的解析方法,该方法基于早期关于上下文无关语法(CFG)和广义短语结构语法(GPSG)的句法研究工作。该方法包含一种新的解析算法以及一套克服CFG局限性的句法规则和特征。它能够同时生成依存句法树和成分句法树,并能适应噪声和不完整的解析结果。该系统在通用依存关系(Universal Dependencies)的数据集上进行了测试,在开发集(7个语料库)上取得了54.5%的平均无标记依存正确率(UAS),在测试集(12个语料库)上取得了53.8%的UAS,结果令人鼓舞。该系统还提供多种解析假设,允许通过进一步的重排序来提高解析准确性。此方法还利用了自20世纪50年代以来大量的理论句法研究成果,并将其应用于计算语境中。该方法的运用为处理语言输入提供了一个透明且可解释的自然语言处理模型。