Entity and relationship extraction is a crucial component in natural language processing tasks such as knowledge graph construction, question answering system design, and semantic analysis. Most of the information of the Yishui school of traditional Chinese Medicine (TCM) is stored in the form of unstructured classical Chinese text. The key information extraction of TCM texts plays an important role in mining and studying the academic schools of TCM. In order to solve these problems efficiently using artificial intelligence methods, this study constructs a word segmentation and entity relationship extraction model based on conditional random fields under the framework of natural language processing technology to identify and extract the entity relationship of traditional Chinese medicine texts, and uses the common weighting technology of TF-IDF information retrieval and data mining to extract important key entity information in different ancient books. The dependency syntactic parser based on neural network is used to analyze the grammatical relationship between entities in each ancient book article, and it is represented as a tree structure visualization, which lays the foundation for the next construction of the knowledge graph of Yishui school and the use of artificial intelligence methods to carry out the research of TCM academic schools.
翻译:实体与关系提取是知识图谱构建、问答系统设计及语义分析等自然语言处理任务中的关键组成部分。易水学派中医典籍信息主要以非结构化的古汉语文本形式存储。中医典籍关键信息的提取对于中医学术流派的挖掘与研究具有重要意义。为利用人工智能方法高效解决上述问题,本研究在自然语言处理技术框架下,基于条件随机场构建了分词与实体关系提取模型,用于识别与提取中医文本中的实体关系;并采用TF-IDF信息检索与数据挖掘中常用的加权技术,提取不同古籍中的关键实体信息。利用基于神经网络的依存句法分析器,分析各古籍篇章中实体间的语法关系,并将其以树形结构进行可视化呈现,为下一步构建易水学派知识图谱及运用人工智能方法开展中医学术流派研究奠定了基础。