To date, most work on text simplification has focused on sentence-level inputs. Early attempts at document simplification merely applied these approaches iteratively over the sentences of a document. However, this fails to coherently preserve the discourse structure, leading to suboptimal output quality. Recently, strategies from controllable simplification have been leveraged to achieve state-of-the-art results on document simplification by first generating a document-level plan (a sequence of sentence-level simplification operations) and using this plan to guide sentence-level simplification downstream. However, this is still limited in that the simplification model has no direct access to the local inter-sentence document context, likely having a negative impact on surface realisation. We explore various systems that use document context within the simplification process itself, either by iterating over larger text units or by extending the system architecture to attend over a high-level representation of document context. In doing so, we achieve state-of-the-art performance on the document simplification task, even when not relying on plan-guidance. Further, we investigate the performance and efficiency tradeoffs of system variants and make suggestions of when each should be preferred.
翻译:迄今为止,大多数文本简化工作主要关注句子级输入。早期文档简化尝试仅是将这些方法逐句应用于整个文档,但这种方式无法连贯地保持语篇结构,导致输出质量欠佳。近年来,可控简化策略被用于实现文档简化的最优效果:首先生成文档级规划(一系列句子级简化操作序列),再通过该规划指导下游句子级简化。然而,这种方法的局限在于简化模型无法直接访问局部的句子间文档上下文,可能对表层实现产生负面影响。本文探索了多种在简化过程中利用文档上下文的系统:或通过迭代处理更大文本单元,或通过扩展系统架构使其能够关注文档上下文的高层表示。实验表明,即使不依赖规划引导,我们的方法在文档简化任务上仍实现了最优性能。此外,我们研究了系统变体在性能与效率间的权衡,并针对不同场景提出了适用性建议。