Temporal relation prediction in incomplete temporal knowledge graphs (TKGs) is a popular temporal knowledge graph completion (TKGC) problem in both transductive and inductive settings. Traditional embedding-based TKGC models (TKGE) rely on structured connections and can only handle a fixed set of entities, i.e., the transductive setting. In the inductive setting where test TKGs contain emerging entities, the latest methods are based on symbolic rules or pre-trained language models (PLMs). However, they suffer from being inflexible and not time-specific, respectively. In this work, we extend the fully-inductive setting, where entities in the training and test sets are totally disjoint, into TKGs and take a further step towards a more flexible and time-sensitive temporal relation prediction approach SST-BERT, incorporating Structured Sentences with Time-enhanced BERT. Our model can obtain the entity history and implicitly learn rules in the semantic space by encoding structured sentences, solving the problem of inflexibility. We propose to use a time masking MLM task to pre-train BERT in a corpus rich in temporal tokens specially generated for TKGs, enhancing the time sensitivity of SST-BERT. To compute the probability of occurrence of a target quadruple, we aggregate all its structured sentences from both temporal and semantic perspectives into a score. Experiments on the transductive datasets and newly generated fully-inductive benchmarks show that SST-BERT successfully improves over state-of-the-art baselines.
翻译:在不完整时序知识图谱(TKG)中的时序关系预测,是一个在直推和归纳设置下均广受关注的时序知识图谱补全(TKGC)问题。传统的基于嵌入的TKGC模型(TKGE)依赖于结构化连接,且仅能处理固定实体集,即直推设置。在测试TKG包含新兴实体的归纳设置下,最新方法基于符号规则或预训练语言模型(PLM)。然而,这些方法分别存在灵活性不足和缺乏时间特异性的问题。本文中,我们将训练集与测试集实体完全不相交的全归纳设置扩展到TKG中,并进一步提出一种更灵活且具备时间敏感性的时序关系预测方法SST-BERT(融合结构化句子的时间增强BERT)。该模型可通过编码结构化句子获取实体历史并在语义空间中隐式学习规则,从而解决灵活性不足的问题。我们提出采用时间掩码MLM任务,在专为TKG生成的富含时间标记的语料库上预训练BERT,增强SST-BERT的时间敏感性。为计算目标四元组的出现概率,我们从时间和语义两个角度聚合其所有结构化句子并映射为得分。在直推数据集及新构建的全归纳基准上的实验表明,SST-BERT成功超越了现有最优基线方法。