Text generation has become more accessible than ever, and the increasing interest in these systems, especially those using large language models, has spurred an increasing number of related publications. We provide a systematic literature review comprising 244 selected papers between 2017 and 2024. This review categorizes works in text generation into five main tasks: open-ended text generation, summarization, translation, paraphrasing, and question answering. For each task, we review their relevant characteristics, sub-tasks, and specific challenges (e.g., missing datasets for multi-document summarization, coherence in story generation, and complex reasoning for question answering). Additionally, we assess current approaches for evaluating text generation systems and ascertain problems with current metrics. Our investigation shows nine prominent challenges common to all tasks and sub-tasks in recent text generation publications: bias, reasoning, hallucinations, misuse, privacy, interpretability, transparency, datasets, and computing. We provide a detailed analysis of these challenges, their potential solutions, and which gaps still require further engagement from the community. This systematic literature review targets two main audiences: early career researchers in natural language processing looking for an overview of the field and promising research directions, as well as experienced researchers seeking a detailed view of tasks, evaluation methodologies, open challenges, and recent mitigation strategies.
翻译:文本生成已变得比以往任何时候都更易于实现,对这些系统(尤其是使用大语言模型的系统)日益增长的兴趣,催生了数量不断增长的相关出版物。我们提供了一项系统性文献综述,涵盖了2017年至2024年间选定的244篇论文。本综述将文本生成领域的成果划分为五大主要任务:开放式文本生成、摘要、翻译、复述和问答。针对每项任务,我们回顾了其相关特征、子任务及具体挑战(例如,多文档摘要缺乏数据集、故事生成的连贯性问题以及问答所需的复杂推理)。此外,我们评估了当前用于评估文本生成系统的方法,并确认了现有指标存在的问题。我们的调查揭示了近期文本生成出版物中所有任务和子任务共存的九个突出挑战:偏见、推理、幻觉、滥用、隐私、可解释性、透明度、数据集和计算。我们对这些挑战、其潜在解决方案以及仍需学界进一步关注的空白领域进行了详细分析。本系统性文献综述主要面向两类读者:寻求领域概览和有前景研究方向的自然语言处理领域早期职业研究人员,以及希望深入了解任务、评估方法、开放挑战和近期缓解策略的经验丰富的研究人员。