Latest efforts on cross-lingual relation extraction (XRE) aggressively leverage the language-consistent structural features from the universal dependency (UD) resource, while they may largely suffer from biased transfer (e.g., either target-biased or source-biased) due to the inevitable linguistic disparity between languages. In this work, we investigate an unbiased UD-based XRE transfer by constructing a type of code-mixed UD forest. We first translate the sentence of the source language to the parallel target-side language, for both of which we parse the UD tree respectively. Then, we merge the source-/target-side UD structures as a unified code-mixed UD forest. With such forest features, the gaps of UD-based XRE between the training and predicting phases can be effectively closed. We conduct experiments on the ACE XRE benchmark datasets, where the results demonstrate that the proposed code-mixed UD forests help unbiased UD-based XRE transfer, with which we achieve significant XRE performance gains.
翻译:最新的跨语言关系抽取(XRE)研究积极利用来自通用依存(UD)资源的语言一致性结构特征,但由于语言间不可避免的差异,它们可能严重受到有偏迁移(例如,目标语言有偏或源语言有偏)的影响。本研究通过构建一种代码混合的UD森林,探索基于UD的无偏差XRE迁移方法。我们首先将源语言句子翻译成平行的目标语言,并分别解析两种语言的UD树,然后将源语言/目标语言的UD结构合并为统一的代码混合UD森林。借助这种森林特征,基于UD的XRE在训练阶段和预测阶段之间的差距得以有效弥合。我们在ACE XRE基准数据集上开展实验,结果表明提出的代码混合UD森林有助于实现基于UD的无偏差XRE迁移,并显著提升了XRE性能。