We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context. We annotated a corpus of 1500 texts extracted from news articles and tweets, around 25 thousand tokens, with the information for the tasks. Three teams took part in the evaluation phase, obtaining in general good results for Task 1, and more mixed results for Tasks 2 and 3.
翻译:我们介绍了首个用于检测和分析瓜拉尼语与西班牙语代码混合的共享任务——GUA-SPA at IberLEF 2023。该挑战包含三项任务:标记的语言识别、命名实体识别(NER),以及一项新颖任务——分类西班牙语片段在代码混合语境中的使用方式。我们标注了一个包含1500篇文本(约2.5万个标记)的语料库,这些文本来自新闻文章和推文,并提供了任务所需的信息。三个团队参与了评估阶段,在任务1中总体取得了良好结果,而在任务2和任务3中结果则更为参差不齐。