Using Large Language Models to Automate Category and Trend Analysis of Scientific Articles: An Application in Ophthalmology

Hina Raja,Asim Munawar,Mohammad Delsoz,Mohammad Elahi,Yeganeh Madadi,Amr Hassan,Hashem Abu Serhan,Onur Inam,Luis Hermandez,Sang Tran,Wuqas Munir,Alaa Abd-Alrazaq,Hao Chen, SiamakYousefi

Purpose: In this paper, we present an automated method for article classification, leveraging the power of Large Language Models (LLM). The primary focus is on the field of ophthalmology, but the model is extendable to other fields. Methods: We have developed a model based on Natural Language Processing (NLP) techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we have employed zero-shot learning (ZSL) LLM models and compared against Bidirectional and Auto-Regressive Transformers (BART) and its variants, and Bidirectional Encoder Representations from Transformers (BERT), and its variant such as distilBERT, SciBERT, PubmedBERT, BioBERT. Results: The classification results demonstrate the effectiveness of LLMs in categorizing large number of ophthalmology papers without human intervention. Results: To evalute the LLMs, we compiled a dataset (RenD) of 1000 ocular disease-related articles, which were expertly annotated by a panel of six specialists into 15 distinct categories. The model achieved mean accuracy of 0.86 and mean F1 of 0.85 based on the RenD dataset. Conclusion: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval in other domains too. We performed trend analysis that enables the researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines.

翻译：目的：本文提出一种基于大语言模型（LLM）的自动化论文分类方法。该方法主要聚焦眼科学领域，但可扩展至其他学科。方法：我们开发了一套基于自然语言处理（NLP）技术（包括先进大语言模型）的模型，用于处理和分析科学论文的文本内容。具体而言，我们采用了零样本学习（ZSL）大语言模型，并与双向自回归变换器（BART）及其变体、基于变换器的双向编码器表示（BERT）及其变体（如distilBERT、SciBERT、PubmedBERT、BioBERT）进行了对比。结果：分类结果表明，大语言模型无需人工干预即可有效完成眼科学论文的大规模分类。为评估大语言模型性能，我们构建了包含1000篇眼病相关论文的RenD数据集，由六名领域专家组成的评审小组将其标注为15个不同类别。基于RenD数据集，该模型实现了平均准确率0.86与平均F1值0.85。结论：所提出的框架在准确性与效率方面均取得显著提升。其在眼科学领域的应用展示了该模型在其他领域知识组织与信息检索方面的潜力。我们开展的论文趋势分析使研究人员和临床医生能够轻松分类和检索相关文献，从而节省文献综述与信息收集的时间精力，并识别不同学科领域的新兴科研趋势。此外，该模型向其他科学领域的可扩展性进一步扩大了其在促进跨学科研究与趋势分析中的影响力。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日