We develop novel annotation guidelines for sentence-level subjectivity detection, which are not limited to language-specific cues. We use our guidelines to collect NewsSD-ENG, a corpus of 638 objective and 411 subjective sentences extracted from English news articles on controversial topics. Our corpus paves the way for subjectivity detection in English and across other languages without relying on language-specific tools, such as lexicons or machine translation. We evaluate state-of-the-art multilingual transformer-based models on the task in mono-, multi-, and cross-language settings. For this purpose, we re-annotate an existing Italian corpus. We observe that models trained in the multilingual setting achieve the best performance on the task.
翻译:我们开发了新颖的句子级主观性检测标注准则,该准则不局限于语言特定的线索。我们运用该准则构建了NewsSD-ENG语料库,其中包含从争议性话题的英语新闻文章中提取的638个客观句与411个主观句。本语料库为英语及其他语言的句子级主观性检测铺平了道路,且无需依赖语言特定工具(如词典或机器翻译)。我们在单语言、多语言及跨语言设定下,评估了基于Transformer的先进多语言模型在此任务上的表现。为此,我们对一个现有的意大利语语料库进行了重新标注。我们观察到,在多语言设定下训练的模型在此任务上取得了最佳性能。