We introduce Spivavtor, a dataset, and instruction-tuned models for text editing focused on the Ukrainian language. Spivavtor is the Ukrainian-focused adaptation of the English-only CoEdIT model. Similar to CoEdIT, Spivavtor performs text editing tasks by following instructions in Ukrainian. This paper describes the details of the Spivavtor-Instruct dataset and Spivavtor models. We evaluate Spivavtor on a variety of text editing tasks in Ukrainian, such as Grammatical Error Correction (GEC), Text Simplification, Coherence, and Paraphrasing, and demonstrate its superior performance on all of them. We publicly release our best-performing models and data as resources to the community to advance further research in this space.
翻译:我们提出了Spivavtor——一个数据集以及针对乌克兰语文本编辑任务进行指令微调的模型。Spivavtor是仅支持英语的CoEdIT模型在乌克兰语上的适配版本。与CoEdIT类似,Spivavtor通过遵循乌克兰语指令来执行文本编辑任务。本文详细描述了Spivavtor-Instruct数据集和Spivavtor模型的构建细节。我们在乌克兰语文本编辑的多项任务上对Spivavtor进行评估,包括语法错误纠正(GEC)、文本简化、连贯性优化以及释义改写,实验结果表明该模型在所有任务上均展现出优越性能。我们已公开发布性能最优的模型参数与数据集,旨在为相关领域的进一步研究提供资源支持。