We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, "automatic" means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptability is one of its greatest advantages. For English, we have created the highly precise aLexiE lexicon with wide coverage, which represents a contribution on its own. We have evaluated the resulting NLG library in an Augmentative and Alternative Communication (AAC) proof of concept, both directly (by regenerating corpus sentences) and manually (from annotations) using a popular corpus in the NLG field. We performed a second analysis by comparing the quality of text expansion in English to Spanish, using an ad-hoc Spanish-English parallel corpus. The system might also be applied to other domains such as report and news generation.
翻译:我们提出了一种自动文本扩展系统,用于生成英文句子。该系统通过结合语言规则与统计方法,实现了自动化的自然语言生成。此处的“自动”指系统能够从最小词集生成连贯且正确的句子。自设计之初,该系统即采用模块化架构,并可适配其他语言,这种适应性是其最大优势之一。针对英语,我们构建了覆盖广泛且精度极高的aLexiE词典,这本身即是一项独立贡献。我们通过两种方式评估了所得NLG库在增强与替代通信概念验证中的表现:直接使用NLG领域常用语料库进行句子重构,以及基于标注进行人工评估。此外,我们通过自建的西班牙语-英语平行语料库,对英语与西班牙语的文本扩展质量进行了对比分析。该系统还可应用于报告生成、新闻生成等其他领域。