Large language models (LLMs) enable strong text generation, and in general there is a practical tradeoff between fine-tuning and prompt engineering. We introduce Simplify-This, a comparative study evaluating both paradigms for text simplification with encoder-decoder LLMs across multiple benchmarks, using a range of evaluation metrics. Fine-tuned models consistently deliver stronger structural simplification, whereas prompting often attains higher semantic similarity scores yet tends to copy inputs. A human evaluation favors fine-tuned outputs overall. We release code, a cleaned derivative dataset used in our study, checkpoints of fine-tuned models, and prompt templates to facilitate reproducibility and future work.
翻译:大型语言模型(LLMs)能够生成高质量的文本,而实践中通常需要在微调与提示工程之间进行权衡。本文提出Simplify-This,一项针对文本简化任务的对比研究,通过多种评估指标,在多个基准测试上比较了基于编码器-解码器架构的LLMs在提示范式与微调范式下的表现。研究发现,微调模型在结构简化方面表现更优,而提示方法虽常获得更高的语义相似度分数,但倾向于直接复制输入内容。人工评估结果整体更倾向于微调模型的输出。为促进可复现性与后续研究,我们公开了代码、研究中使用的清洗衍生数据集、微调模型检查点以及提示模板。