We analyze the performance of large language models (LLMs) on Text Style Transfer (TST), specifically focusing on sentiment transfer and text detoxification across three languages: English, Hindi, and Bengali. Text Style Transfer involves modifying the linguistic style of a text while preserving its core content. We evaluate the capabilities of pre-trained LLMs using zero-shot and few-shot prompting as well as parameter-efficient finetuning on publicly available datasets. Our evaluation using automatic metrics, GPT-4 and human evaluations reveals that while some prompted LLMs perform well in English, their performance in on other languages (Hindi, Bengali) remains average. However, finetuning significantly improves results compared to zero-shot and few-shot prompting, making them comparable to previous state-of-the-art. This underscores the necessity of dedicated datasets and specialized models for effective TST.
翻译:本文分析了大型语言模型(LLMs)在文本风格转换(TST)任务上的性能,特别聚焦于英语、印地语和孟加拉语三种语言的情感迁移与文本去毒化任务。文本风格转换旨在修改文本的语言风格,同时保持其核心内容不变。我们通过零样本提示、少样本提示以及基于公开数据集的参数高效微调方法,评估了预训练LLMs的能力。基于自动评估指标、GPT-4和人工评估的综合分析表明:尽管部分采用提示策略的LLMs在英语任务中表现良好,但其在印地语和孟加拉语等其他语言上的性能仍处于平均水平。然而,与零样本和少样本提示相比,微调方法能显著提升模型效果,使其达到与先前最优方法相当的水平。这一发现凸显了构建专用数据集和开发专业化模型对于实现高效文本风格转换的必要性。