The aim of this paper is to evaluate whether large language models trained on multi-choice question data can be used to discriminate between medical subjects. This is an important and challenging task for automatic question answering. To achieve this goal, we train deep neural networks for multi-class classification of questions into the inferred medical subjects. Using our Multi-Question (MQ) Sequence-BERT method, we outperform the state-of-the-art results on the MedMCQA dataset with an accuracy of 0.68 and 0.60 on their development and test sets, respectively. In this sense, we show the capability of AI and LLMs in particular for multi-classification tasks in the Healthcare domain.
翻译:本文旨在评估基于多选题数据训练的大语言模型能否用于区分医学学科。这是自动问答领域中一项重要且具有挑战性的任务。为实现该目标,我们训练深度神经网络对问题进行多分类,以推断其所属医学学科。采用我们提出的多问题(MQ)序列BERT方法,在MedMCQA数据集上分别以开发集0.68和测试集0.60的准确率超越了当前最优结果。由此,我们展示了人工智能,尤其是大语言模型在医疗领域多分类任务中的能力。