Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest

MoDELS · FT · Performer · ML · Learning ·

2023 年 12 月 20 日

翻译：基准测试与分析：针对生物医学知识管理的上下文学习、微调与监督学习——以生物相关化学实体为重点研究

Emily Groves,Minhong Wang,Yusuf Abdulle,Holger Kunz,Jason Hoelscher-Obermaier,Ronin Wu,Honghan Wu

from arxiv, 26 pages, 5 figures, 14 tables

Automated knowledge curation for biomedical ontologies is key to ensure that they remain comprehensive, high-quality and up-to-date. In the era of foundational language models, this study compares and analyzes three NLP paradigms for curation tasks: in-context learning (ICL), fine-tuning (FT), and supervised learning (ML). Using the Chemical Entities of Biological Interest (ChEBI) database as a model ontology, three curation tasks were devised. For ICL, three prompting strategies were employed with GPT-4, GPT-3.5, BioGPT. PubmedBERT was chosen for the FT paradigm. For ML, six embedding models were utilized for training Random Forest and Long-Short Term Memory models. Five setups were designed to assess ML and FT model performance across different data availability scenarios.Datasets for curation tasks included: task 1 (620,386), task 2 (611,430), and task 3 (617,381), maintaining a 50:50 positive versus negative ratio. For ICL models, GPT-4 achieved best accuracy scores of 0.916, 0.766 and 0.874 for tasks 1-3 respectively. In a direct comparison, ML (trained on ~260,000 triples) outperformed ICL in accuracy across all tasks. (accuracy differences: +.11, +.22 and +.17). Fine-tuned PubmedBERT performed similarly to leading ML models in tasks 1 & 2 (F1 differences: -.014 and +.002), but worse in task 3 (-.048). Simulations revealed performance declines in both ML and FT models with smaller and higher imbalanced training data. where ICL (particularly GPT-4) excelled in tasks 1 & 3. GPT-4 excelled in tasks 1 and 3 with less than 6,000 triples, surpassing ML/FT. ICL underperformed ML/FT in task 2.ICL-augmented foundation models can be good assistants for knowledge curation with correct prompting, however, not making ML and FT paradigms obsolete. The latter two require task-specific data to beat ICL. In such cases, ML relies on small pretrained embeddings, minimizing computational demands.

翻译：生物医学本体的自动化知识管理对于确保其全面性、高质量和及时更新至关重要。在大语言模型时代，本研究比较并分析了三种用于知识管理任务的自然语言处理范式：上下文学习（ICL）、微调（FT）和监督学习（ML）。以生物相关化学实体（ChEBI）数据库为模型本体，设计了三个知识管理任务。对于ICL，采用GPT-4、GPT-3.5和BioGPT三种模型，并使用三种提示策略；对于FT范式，选择PubMedBERT；对于ML，利用六种嵌入模型训练随机森林和长短期记忆模型。设计了五种场景来评估不同数据可用性下ML和FT模型的性能。管理任务的数据集包括：任务1（620,386条）、任务2（611,430条）和任务3（617,381条），保持50:50的正负样本比例。对于ICL模型，GPT-4在任务1-3中分别达到最高准确率0.916、0.766和0.874。在直接比较中，基于约260,000个三元组训练的ML模型在准确率上全面超越ICL（准确率差异：+0.11、+0.22和+0.17）。微调的PubMedBERT在任务1和任务2中与领先的ML模型表现相近（F1差异：-0.014和+0.002），但在任务3中较差（-0.048）。模拟实验显示，当训练数据量较小且类别不平衡程度较高时，ML和FT模型的性能均出现下降，而ICL（尤其是GPT-4）在任务1和任务3中表现出色。在少于6,000个三元组时，GPT-4在任务1和任务3中超越ML/FT；而在任务2中，ICL表现逊于ML/FT。ICL增强的基础模型在正确提示下可作为知识管理的优秀助手，但并未使ML和FT范式过时。后两种范式需要特定任务数据进行训练才能超越ICL。在此类情况下，ML依赖于小型预训练嵌入，从而最大限度降低计算需求。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日