We propose MINT, a new Multilingual INTimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic. We benchmarked a list of popular multilingual pre-trained language models. The dataset is released along with the SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis (https://sites.google.com/umich.edu/semeval-2023-tweet-intimacy).
翻译:我们提出MINT,一个新的多语言亲密度分析数据集,涵盖10种语言的13,372条推文,包括英语、法语、西班牙语、意大利语、葡萄牙语、韩语、荷兰语、中文、印地语和阿拉伯语。我们对一系列流行的多语言预训练语言模型进行了基准测试。该数据集已随SemEval 2023任务9:多语言推文亲密度分析发布(https://sites.google.com/umich.edu/semeval-2023-tweet-intimacy)。