Wikidata is currently the largest open knowledge graph on the web, encompassing over 120 million entities. It integrates data from various domain-specific databases and imports a substantial amount of content from Wikipedia, while also allowing users to freely edit its content. This openness has positioned Wikidata as a central resource in knowledge graph research and has enabled convenient knowledge access for users worldwide. However, its relatively loose editorial policy has also led to a degree of taxonomic inconsistency. Building on prior work, this study proposes and applies a novel validation method to confirm the presence of classification errors, over-generalized subclass links, and redundant connections in specific domains of Wikidata. We further introduce a new evaluation criterion for determining whether such issues warrant correction and develop a system that allows users to inspect the taxonomic relationships of arbitrary Wikidata entities-leveraging the platform's crowdsourced nature to its full potential.
翻译:Wikidata是目前网络上规模最大的开放知识图谱,包含超过1.2亿个实体。它整合了来自多个领域专用数据库的数据,并导入了大量维基百科内容,同时允许用户自由编辑其内容。这种开放性使Wikidata成为知识图谱研究的核心资源,并为全球用户提供了便捷的知识访问途径。然而,其相对宽松的编辑政策也导致了一定程度的分类学不一致性。本研究在先前工作的基础上,提出并应用了一种新颖的验证方法,以确认Wikidata特定领域中存在的分类错误、过度泛化的子类链接及冗余连接。我们进一步引入了一种新的评估标准,用于判定此类问题是否值得修正,并开发了一个允许用户检查任意Wikidata实体分类关系的系统——充分发挥该平台众包特性的潜力。