The paper presents our work on cross-lingual ontology alignment system which uses embedding based cosine similarity matching. The ontology entities are made contextually richer by creating descriptions using novel techniques. We use a fine-tuned transformer based multilingual model for generating better embeddings. We use cosine similarity to find positive ontology entities pairs and then apply threshold filtering to retain only highly similar entities. We have evaluated our work on OAEI-2022 multifarm track. We achieve 71% F1 score (78% recall and 65% precision) on the evaluation dataset, 16% increase from best baseline score. This suggests that our proposed alignment pipeline is able to capture the subtle cross-lingual similarities.
翻译:本文介绍了一种基于嵌入余弦相似度匹配的跨语言本体对齐系统。通过采用创新技术生成描述,使本体实体在语境上更加丰富。我们使用基于微调Transformer的多语言模型来生成更优的嵌入表示。通过余弦相似度匹配正样本本体实体对,并应用阈值过滤仅保留高度相似的实体。我们在OAEI-2022多语言农场赛道数据集上评估了本工作,在评测数据集中取得了71%的F1值(召回率78%,精确率65%),较最佳基线分数提升16%。这表明我们提出的对齐流程能够有效捕捉细微的跨语言相似性。