Modern supervised learning neural network models require a large amount of manually labeled data, which makes the construction of domain-specific knowledge graphs time-consuming and labor-intensive. In parallel, although there has been much research on named entity recognition and relation extraction based on distantly supervised learning, constructing a domain-specific knowledge graph from large collections of textual data without manual annotations is still an urgent problem to be solved. In response, we propose an integrated framework for adapting and re-learning knowledge graphs from one coarse domain (biomedical) to a finer-define domain (oncology). In this framework, we apply distant-supervision on cross-domain knowledge graph adaptation. Consequently, no manual data annotation is required to train the model. We introduce a novel iterative training strategy to facilitate the discovery of domain-specific named entities and triples. Experimental results indicate that the proposed framework can perform domain adaptation and construction of knowledge graph efficiently.
翻译:现代有监督学习神经网络模型需要大量人工标注的数据,这使得特定领域知识图谱的构建既耗时又费力。与此同时,尽管基于远程监督学习的命名实体识别和关系抽取已有大量研究,但在无人工标注的情况下,从大规模文本数据中构建特定领域知识图谱仍是一个亟待解决的问题。为此,我们提出一个集成框架,用于从粗粒度领域(生物医学)到更细粒度定义领域(肿瘤学)的知识图谱自适应与重学习。在该框架中,我们采用远程监督方法进行跨领域知识图谱自适应,从而无需人工数据标注即可训练模型。我们引入了一种新颖的迭代训练策略,以促进特定领域命名实体及三元组的发现。实验结果表明,所提框架能够高效实现知识图谱的领域自适应与构建。