Capabilities of large language models to generate multilingual coherent text have continuously enhanced in recent years, which opens concerns about their potential misuse. Previous research has shown that they can be misused for generation of personalized disinformation in multiple languages. It has also been observed that personalization negatively affects detectability of machine-generated texts; however, this has been studied in the English language only. In this work, we examine this phenomenon across 10 languages, while we focus not only on potential misuse of personalization capabilities, but also on potential benefits they offer. Overall, we cover 1080 combinations of various personalization aspects in the prompts, for which the texts are generated by 16 distinct language models (17,280 texts in total). Our results indicate that there are differences in personalization quality of the generated texts when targeting demographic groups and when targeting social-media platforms across languages. Personalization towards platforms affects detectability of the generated texts in a higher scale, especially in English, where the personalization quality is the highest.
翻译:近年来,大语言模型生成多语言连贯文本的能力持续增强,这引发了对其潜在滥用的担忧。先前研究表明,它们可能被滥用于生成多语言个性化虚假信息。同时有观察发现,个性化会降低机器生成文本的可检测性;然而,该现象目前仅在英语语境中得到研究。本研究在10种语言中检验这一现象,不仅关注个性化能力的潜在滥用风险,也探讨其可能带来的益处。我们总计覆盖提示中1080种不同个性化维度的组合,并由16个不同的语言模型生成相应文本(共计17,280篇)。研究结果表明,在针对不同语言的人口统计群体和社交媒体平台时,生成文本的个性化质量存在差异。针对平台的个性化对生成文本可检测性的影响程度更高,尤其在个性化质量最高的英语中表现最为显著。