Text simplification (TS) systems rewrite text to make it more readable while preserving its content. However, what makes a text easy to read depends on the intended readers. Recent work has shown that pre-trained language models can simplify text using a wealth of techniques to control output simplicity, ranging from specifying only the desired reading grade level, to directly specifying low-level edit operations. Yet it remains unclear how to set these control parameters in practice. Existing approaches set them at the corpus level, disregarding the complexity of individual inputs and considering only one level of output complexity. In this work, we conduct an empirical study to understand how different control mechanisms impact the adequacy and simplicity of text simplification systems. Based on these insights, we introduce a simple method that predicts the edit operations required for simplifying a text for a specific grade level on an instance-per-instance basis. This approach improves the quality of the simplified outputs over corpus-level search-based heuristics.
翻译:文本简化系统通过改写文本,在保留内容的同时提升其可读性。然而,文本易读性的定义取决于目标读者群体。近期研究表明,预训练语言模型可采用多种技术手段控制输出文本的简洁程度,既可通过指定目标阅读年级这种宏观方式,也可直接指定细粒度的编辑操作。但实践中如何设定这些控制参数仍不明确——现有方法均在语料库层面统一设定参数,既未考虑单个输入文本的复杂度差异,也仅关注单一输出复杂度层级。本研究通过实证分析,系统探究不同控制机制对文本简化系统充分性与简洁性的影响。基于研究结论,我们提出一种简单方法,可针对每个具体实例预测为达成特定年级简化目标所需的编辑操作。相较于基于语料库搜索的启发式方法,本方法能有效提升简化输出的质量。