The rapid growth of scientific publications, particularly during the COVID-19 pandemic, emphasizes the need for tools to help researchers efficiently comprehend the latest advancements. One essential part of understanding scientific literature is research aspect classification, which categorizes sentences in abstracts to Background, Purpose, Method, and Finding. In this study, we investigate the impact of different datasets on model performance for the crowd-annotated CODA-19 research aspect classification task. Specifically, we explore the potential benefits of using the large, automatically curated PubMed 200K RCT dataset and evaluate the effectiveness of large language models (LLMs), such as LLaMA, GPT-3, ChatGPT, and GPT-4. Our results indicate that using the PubMed 200K RCT dataset does not improve performance for the CODA-19 task. We also observe that while GPT-4 performs well, it does not outperform the SciBERT model fine-tuned on the CODA-19 dataset, emphasizing the importance of a dedicated and task-aligned datasets dataset for the target task. Our code is available at https://github.com/Crowd-AI-Lab/CODA-19-exp.
翻译:科学出版物数量的快速增长,尤其是在COVID-19疫情期间,凸显了开发工具以帮助研究者高效掌握最新进展的需求。理解科学文献的一个关键环节是研究方面分类,即将摘要中的句子归类为背景、目的、方法和发现。在本研究中,我们探究了不同数据集对人群标注的CODA-19研究方面分类任务中模型性能的影响。具体而言,我们探索了使用大规模自动整理的PubMed 200K RCT数据集的潜在优势,并评估了大型语言模型(LLMs),如LLaMA、GPT-3、ChatGPT和GPT-4的有效性。我们的结果表明,使用PubMed 200K RCT数据集并未提升CODA-19任务的性能。我们还观察到,尽管GPT-4表现良好,但其性能并未超越基于CODA-19数据集微调的SciBERT模型,这强调了针对目标任务使用专用且任务对齐的数据集的重要性。我们的代码可在https://github.com/Crowd-AI-Lab/CODA-19-exp获取。