The information explosion in the form of ETDs poses the challenge of management and extraction of appropriate knowledge for decision-making. Thus, the present study forwards a solution to the above problem by applying topic mining and prediction modeling tools to 263 ETDs submitted to the PQDT Global database during 2016-18 in the field of library science. This study was divided into two phases. The first phase determined the core topics from the ETDs using Topic-Modeling-Tool (TMT), which was based on latent dirichlet allocation (LDA), whereas the second phase employed prediction analysis using RapidMinerplatform to annotate the future research articles on the basis of the modeled topics. The core topics (tags) for the studied period were found to be book history, school librarian, public library, communicative ecology, and informatics followed by text network and trend analysis on the high probability cooccurred words. Lastly, a prediction model using Support Vector Machine (SVM) classifier was created in order to accurately predict the placement of future ETDs going to be submitted to PQDT Global under the five modeled topics (a to e). The tested dataset against the trained data set for the predictive performed perfectly.
翻译:以电子学位论文(ETDs)形式呈现的信息爆炸给知识管理与精准知识提取以支持决策带来了挑战。为此,本研究提出了一种解决方案:运用主题挖掘与预测建模工具,对2016-2018年间提交至PQDT全球数据库的263篇图书馆学领域电子学位论文进行分析。研究分两个阶段进行:第一阶段基于潜在狄利克雷分配(LDA)算法,利用主题建模工具(TMT)从电子学位论文中提取核心主题;第二阶段则采用RapidMiner平台开展预测分析,依据已建模主题对未来的研究文献进行标注。研究发现,研究时段内的核心主题(标签)包括图书史、学校图书馆员、公共图书馆、传播生态学与信息学,并进一步对高概率共现词进行了文本网络与趋势分析。最后,基于支持向量机(SVM)分类器构建预测模型,以准确预测未来将提交至PQDT Global的五类建模主题(a至e)电子学位论文的归属。测试数据集对训练数据集的预测结果表现完美。