In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, non-projective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses. We survey shortcomings of RST and other existing frameworks, such as Segmented Discourse Representation Theory (SDRT), the Penn Discourse Treebank (PDTB) and Discourse Dependencies, and address these using constructs in the proposed theory. We provide annotation, search and visualization tools for data, and present and evaluate a freely available corpus of English annotated according to our framework, encompassing 12 spoken and written genres with over 200K tokens. Finally, we discuss automatic parsing, evaluation metrics and applications for data in our framework.
翻译:本文提出增强修辞结构理论(eRST),这是一种基于修辞结构理论(RST)扩展的新型计算语篇分析理论框架。该框架涵盖具有破树结构、非投射关系及并发关系的语篇关系图,同时包含为分析提供可解释依据的隐式与显式信号。我们系统评述了RST及其他现有框架(如分段语篇表征理论(SDRT)、宾州语篇树库(PDTB)和语篇依存理论)的局限性,并运用本理论提出的结构体系予以解决。我们为数据提供了标注、检索与可视化工具,并基于本框架构建了一个免费开放的英语标注语料库(涵盖12种口语与书面语体裁,包含超过20万词例),同时对该语料库进行了呈现与评估。最后,我们探讨了本框架下数据的自动解析方法、评估指标及其应用场景。