In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, nonprojective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses. We survey shortcomings of RST and other existing frameworks, such as Segmented Discourse Representation Theory (SDRT), the Penn Discourse Treebank (PDTB) and Discourse Dependencies, and address these using constructs in the proposed theory. We provide annotation, search and visualization tools for data, and present and evaluate a freely available corpus of English annotated according to our framework, encompassing 12 spoken and written genres with over 200K tokens. Finally, we discuss automatic parsing, evaluation metrics and applications for data in our framework.
翻译:本文提出了增强修辞结构理论(eRST),该理论基于修辞结构理论(RST)的扩展,为计算话语分析提供了一种新的理论框架。该框架涵盖了具有破树、非投射及并发关系的话语关系图,以及隐式和显式信号,为我们的分析提供了可解释的依据。我们审视了RST及其他现有框架(如分段话语表征理论(SDRT)、宾州话语树库(PDTB)和话语依存关系)的不足,并通过提出理论中的构念来加以解决。我们提供了数据标注、搜索及可视化工具,并展示和评估了根据我们的框架标注的免费公开英语语料库,该语料库涵盖12种口语和书面语体裁,包含超过20万词。最后,我们讨论了面向我们框架数据的自动解析、评估指标及应用。