Purpose: In this paper, we present an automated method for article classification, leveraging the power of Large Language Models (LLM). The primary focus is on the field of ophthalmology, but the model is extendable to other fields. Methods: We have developed a model based on Natural Language Processing (NLP) techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we have employed zero-shot learning (ZSL) LLM models and compared against Bidirectional and Auto-Regressive Transformers (BART) and its variants, and Bidirectional Encoder Representations from Transformers (BERT), and its variant such as distilBERT, SciBERT, PubmedBERT, BioBERT. Results: The classification results demonstrate the effectiveness of LLMs in categorizing large number of ophthalmology papers without human intervention. Results: To evalute the LLMs, we compiled a dataset (RenD) of 1000 ocular disease-related articles, which were expertly annotated by a panel of six specialists into 15 distinct categories. The model achieved mean accuracy of 0.86 and mean F1 of 0.85 based on the RenD dataset. Conclusion: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval in other domains too. We performed trend analysis that enables the researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines.
翻译:目的:本文提出一种基于大语言模型(LLM)的自动化论文分类方法。该方法主要聚焦眼科学领域,但可扩展至其他学科。方法:我们开发了一套基于自然语言处理(NLP)技术(包括先进大语言模型)的模型,用于处理和分析科学论文的文本内容。具体而言,我们采用了零样本学习(ZSL)大语言模型,并与双向自回归变换器(BART)及其变体、基于变换器的双向编码器表示(BERT)及其变体(如distilBERT、SciBERT、PubmedBERT、BioBERT)进行了对比。结果:分类结果表明,大语言模型无需人工干预即可有效完成眼科学论文的大规模分类。为评估大语言模型性能,我们构建了包含1000篇眼病相关论文的RenD数据集,由六名领域专家组成的评审小组将其标注为15个不同类别。基于RenD数据集,该模型实现了平均准确率0.86与平均F1值0.85。结论:所提出的框架在准确性与效率方面均取得显著提升。其在眼科学领域的应用展示了该模型在其他领域知识组织与信息检索方面的潜力。我们开展的论文趋势分析使研究人员和临床医生能够轻松分类和检索相关文献,从而节省文献综述与信息收集的时间精力,并识别不同学科领域的新兴科研趋势。此外,该模型向其他科学领域的可扩展性进一步扩大了其在促进跨学科研究与趋势分析中的影响力。