Multiword expressions are a key ingredient for developing large-scale and linguistically sound natural language processing technology. This paper describes our improvements in automatically identifying Romanian multiword expressions on the corpus released for the PARSEME v1.2 shared task. Our approach assumes a multilingual perspective based on the recently introduced lateral inhibition layer and adversarial training to boost the performance of the employed multilingual language models. With the help of these two methods, we improve the F1-score of XLM-RoBERTa by approximately 2.7% on unseen multiword expressions, the main task of the PARSEME 1.2 edition. In addition, our results can be considered SOTA performance, as they outperform the previous results on Romanian obtained by the participants in this competition.
翻译:多词表达是开发大规模、语言合理的自然语言处理技术的关键要素。本文描述了我们在PARSEME v1.2共享任务发布的语料库上自动识别罗马尼亚语多词表达方面的改进。我们的方法基于最近引入的侧向抑制层和对抗训练,采用多语言视角来提升所使用的多语言语言模型的性能。借助这两种方法,我们在PARSEME 1.2版本的主要任务——未见过的多词表达上,将XLM-RoBERTa的F1分数提高了约2.7%。此外,我们的结果可被视为当前最佳性能(SOTA),因为它们优于本次竞赛中参赛者在罗马尼亚语上取得的前期成果。