Text generation has become more accessible than ever, and the increasing interest in these systems, especially those using large language models, has spurred an increasing number of related publications. We provide a systematic literature review comprising 244 selected papers between 2017 and 2024. This review categorizes works in text generation into five main tasks: open-ended text generation, summarization, translation, paraphrasing, and question answering. For each task, we review their relevant characteristics, sub-tasks, and specific challenges (e.g., missing datasets for multi-document summarization, coherence in story generation, and complex reasoning for question answering). Additionally, we assess current approaches for evaluating text generation systems and ascertain problems with current metrics. Our investigation shows nine prominent challenges common to all tasks and sub-tasks in recent text generation publications: bias, reasoning, hallucinations, misuse, privacy, interpretability, transparency, datasets, and computing. We provide a detailed analysis of these challenges, their potential solutions, and which gaps still require further engagement from the community. This systematic literature review targets two main audiences: early career researchers in natural language processing looking for an overview of the field and promising research directions, as well as experienced researchers seeking a detailed view of tasks, evaluation methodologies, open challenges, and recent mitigation strategies.
翻译:文本生成技术已变得前所未有的易于获取,对这些系统(尤其是使用大语言模型的系统)日益增长的兴趣,催生了大量相关出版物。本文对2017年至2024年间选取的244篇论文进行了系统性文献综述。该综述将文本生成研究归纳为五大核心任务:开放式文本生成、摘要生成、翻译、复述与问答。针对每项任务,我们梳理了其相关特征、子任务及具体挑战(例如多文档摘要任务中数据集的缺失、故事生成中的连贯性问题、问答任务中的复杂推理需求)。此外,我们评估了当前文本生成系统的评价方法,并指出了现有度量指标存在的问题。研究表明,近年文本生成文献中普遍存在九大共性挑战:偏见、推理、幻觉、滥用、隐私、可解释性、透明度、数据集与计算资源。我们对这些挑战进行了深入分析,探讨了潜在解决方案,并指出仍需学界进一步关注的研究空白。本系统性文献综述主要面向两类读者:寻求领域概览与前沿研究方向的自然语言处理领域早期研究者,以及需要深入了解任务框架、评估方法、开放挑战与近期缓解策略的资深学者。