Word Sense Disambiguation (WSD) has been widely evaluated using the semantic frameworks of WordNet, BabelNet, and the Oxford Dictionary of English. However, for the UCREL Semantic Analysis System (USAS) framework, no open extensive evaluation has been performed beyond lexical coverage or single language evaluation. In this work, we perform the largest semantic tagging evaluation of the rule based system that uses the lexical resources in the USAS framework covering five different languages using four existing datasets and one novel Chinese dataset. We create a new silver labelled English dataset, to overcome the lack of manually tagged training data, that we train and evaluate various mono and multilingual neural models in both mono and cross-lingual evaluation setups with comparisons to their rule based counterparts, and show how a rule based system can be enhanced with a neural network model. The resulting neural network models, including the data they were trained on, the Chinese evaluation dataset, and all of the code have been released as open resources.
翻译:词义消歧任务已在WordNet、BabelNet和《牛津英语词典》等语义框架下得到广泛评估。然而,对于UCREL语义分析系统框架,除词汇覆盖度或单一语言评估外,尚未进行公开的大规模评估。本研究利用USAS框架中的词汇资源,通过四个现有数据集和一个新颖的中文数据集,对覆盖五种语言的基于规则的语义标注系统进行了迄今最大规模的评估。为克服人工标注训练数据的不足,我们创建了新的银标注英文数据集,在该数据集上训练并评估了多种单语及多语言神经模型,涵盖单语与跨语言评估设置,并与对应的基于规则系统进行比较,展示了如何通过神经网络模型增强基于规则的系统。最终发布的神经网络模型(包括其训练数据)、中文评估数据集及全部代码均已作为开放资源发布。