While machines learn from existing corpora, humans have the unique capability to establish and accept new language systems. This makes human form unique language systems within social groups. Aligning with this, we focus on a gap remaining in addressing translation challenges within social groups, where in-group members utilize unique terminologies. We propose KpopMT dataset, which aims to fill this gap by enabling precise terminology translation, choosing Kpop fandom as an initiative for social groups given its global popularity. Expert translators provide 1k English translations for Korean posts and comments, each annotated with specific terminology within social groups' language systems. We evaluate existing translation systems including GPT models on KpopMT to identify their failure cases. Results show overall low scores, underscoring the challenges of reflecting group-specific terminologies and styles in translation. We make KpopMT publicly available.
翻译:尽管机器能够从现有语料库中学习,人类却具备建立并接纳新语言体系的独特能力。这使得人类能够在社会群体内部形成独特的语言系统。基于此,我们关注到当前翻译研究在处理社会群体内部翻译挑战时仍存在空白——这些群体成员常使用特定术语。为此,我们提出KpopMT数据集,旨在通过实现精准术语翻译填补这一空白,并选择具有全球影响力的Kpop粉丝群体作为社会群体的初步研究对象。专业译者提供了1,000条韩语帖子及评论的英文翻译,每条翻译均标注了社会群体语言系统中的特定术语。我们基于KpopMT评估了包括GPT系列模型在内的现有翻译系统,以识别其失效案例。结果显示整体得分较低,凸显了在翻译中反映群体特定术语与风格所面临的挑战。我们将公开提供KpopMT数据集。