Our research explores the use of natural language processing (NLP) methods to automatically classify entities for the purpose of knowledge graph population and integration with food system ontologies. We have created NLP models that can automatically classify organizations with respect to categories associated with environmental issues as well as Standard Industrial Classification (SIC) codes, which are used by the U.S. government to characterize business activities. As input, the NLP models are provided with text snippets retrieved by the Google search engine for each organization, which serves as a textual description of the organization that is used for learning. Our experimental results show that NLP models can achieve reasonably good performance for these two classification tasks, and they rely on a general framework that could be applied to many other classification problems as well. We believe that NLP models represent a promising approach for automatically harvesting information to populate knowledge graphs and aligning the information with existing ontologies through shared categories and concepts.
翻译:本研究探索了利用自然语言处理方法自动对实体进行分类,以构建知识图谱并与食品系统本体集成。我们开发了自然语言处理模型,能够根据环境问题相关类别以及美国政府用于表征商业活动的标准行业分类代码,自动对组织进行分类。模型输入的是谷歌搜索引擎为每个组织检索的文本片段,这些文本作为组织的文字描述用于学习。实验结果表明,自然语言处理模型在这两个分类任务中能够取得相当不错的性能,且基于一个可广泛应用于其他分类问题的通用框架。我们认为,自然语言处理模型代表了自动收集信息以构建知识图谱、并通过共享类别和概念将信息与现有本体对齐的有前景的方法。