Curated knowledge graphs encode domain expertise and improve the performance of recommendation, segmentation, ad targeting, and other machine learning systems in several domains. As new concepts emerge in a domain, knowledge graphs must be expanded to preserve machine learning performance. Manually expanding knowledge graphs, however, is infeasible at scale. In this work, we propose a method for knowledge graph expansion with humans-in-the-loop. Concretely, given a knowledge graph, our method predicts the "parents" of new concepts to be added to this graph for further verification by human experts. We show that our method is both accurate and provably "human-friendly". Specifically, we prove that our method predicts parents that are "near" concepts' true parents in the knowledge graph, even when the predictions are incorrect. We then show, with a controlled experiment, that satisfying this property increases both the speed and the accuracy of the human-algorithm collaboration. We further evaluate our method on a knowledge graph from Pinterest and show that it outperforms competing methods on both accuracy and human-friendliness. Upon deployment in production at Pinterest, our method reduced the time needed for knowledge graph expansion by ~400% (compared to manual expansion), and contributed to a subsequent increase in ad revenue of 20%.
翻译:策划知识图谱编码了领域专业知识,并提升了推荐、分割、广告定向及其他机器学习系统在多个领域的性能。当新概念在领域中涌现时,必须扩展知识图谱以维持机器学习性能。然而,人工扩展知识图谱在大规模场景下不可行。本文提出一种将人类反馈纳入循环的知识图谱扩展方法。具体而言,给定知识图谱后,我们的方法能预测待添加新概念的"父节点",这些预测结果将提交给领域专家进行进一步验证。我们证明该方法兼具准确性和可证实的"人类友好性"。具体来说,我们证明即便预测存在误差,该方法预测的父节点仍始终位于知识图谱中概念真实父节点的"邻近区域"。通过受控实验,我们验证了满足该特性可同时提升人机协作的速度与准确率。随后在Pinterest平台的知识图谱上进行的评估表明,我们的方法在准确性和人类友好性方面均优于对比方法。当部署于Pinterest生产环境后,该方法将知识图谱扩展所需时间减少了约400%(相较于人工扩展),并推动广告收入后续增长20%。