The construction of large open knowledge bases (OKBs) is integral to many applications in the field of mobile computing. Noun phrases and relational phrases in OKBs often suffer from redundancy and ambiguity, which calls for the investigation on OKB canonicalization. However, in order to meet the requirements of some privacy protection regulations and to ensure the timeliness of the data, the canonicalized OKB often needs to remove some sensitive information or outdated data. The machine unlearning in OKB canonicalization is an excellent solution to the above problem. Current solutions address OKB canonicalization by devising advanced clustering algorithms and using knowledge graph embedding (KGE) to further facilitate the canonicalization process. Effective schemes are urgently needed to fully synergise machine unlearning with clustering and KGE learning. To this end, we put forward a multi-task unlearning framework, namely MulCanon, to tackle machine unlearning problem in OKB canonicalization. Specifically, the noise characteristics in the diffusion model are utilized to achieve the effect of machine unlearning for data in OKB. MulCanon unifies the learning objectives of diffusion model, KGE and clustering algorithms, and adopts a two-step multi-task learning paradigm for training. A thorough experimental study on popular OKB canonicalization datasets validates that MulCanon achieves advanced machine unlearning effects.
翻译:大型开放知识库的构建是移动计算领域众多应用的重要组成部分。开放知识库中的名词短语和关系短语常存在冗余与歧义问题,这促使研究者对开放知识库规范化展开探索。然而,为满足某些隐私保护法规的要求并确保数据时效性,规范化后的开放知识库往往需要移除敏感信息或过时数据。开放知识库规范化中的机器遗忘技术正是解决上述问题的优秀方案。当前方法通过设计高级聚类算法并利用知识图谱嵌入来推进规范化进程,亟需有效方案实现机器遗忘与聚类及知识图谱嵌入学习的全面协同。为此,我们提出多任务遗忘框架MulCanon,以应对开放知识库规范化中的机器遗忘问题。具体而言,我们利用扩散模型中的噪声特性实现开放知识库数据的机器遗忘效果。MulCanon统一了扩散模型、知识图谱嵌入与聚类算法的学习目标,并采用两步式多任务学习范式进行训练。在主流开放知识库规范化数据集上的充分实验验证表明,MulCanon实现了先进的机器遗忘效果。