Multiple Choice Question Answering (MCQA) is an important problem with numerous real-world applications, such as medicine, law, and education. The high cost of building MCQA datasets makes few-shot learning pivotal in this domain. While Large Language Models (LLMs) can enable few-shot learning, their direct application in real-world scenarios is often hindered by their high computational cost. To address this challenge, we propose a simple yet effective approach that uses LLMs for data generation and scoring. Our approach utilizes LLMs to create MCQA data which contains questions and choices, and to assign probability scores to the generated choices. We then use the generated data and LLM-assigned scores to finetune a smaller and more efficient encoder-only model, DeBERTa-v3-base by leveraging distillation loss. Extensive experiments on the Massive Multitask Language Understanding (MMLU) benchmark demonstrate that our method improves accuracy from 28.9% to 39.3%, representing a gain of over 10% compared to a baseline finetuned directly on 5-shot examples. This shows the effectiveness of LLM-driven data generation and knowledge distillation for few-shot MCQA.
翻译:多项选择题问答(MCQA)是一个具有众多实际应用的重要问题,例如在医学、法律和教育领域。构建MCQA数据集的高昂成本使得少样本学习在该领域至关重要。虽然大型语言模型(LLMs)能够实现少样本学习,但其高昂的计算成本常常阻碍了它们在现实场景中的直接应用。为应对这一挑战,我们提出了一种简单而有效的方法,利用LLMs进行数据生成和评分。我们的方法利用LLMs创建包含问题和选项的MCQA数据,并为生成的选项分配概率分数。随后,我们利用生成的数据和LLM分配的分数,通过知识蒸馏损失对更小、更高效的仅编码器模型DeBERTa-v3-base进行微调。在Massive Multitask Language Understanding(MMLU)基准测试上的大量实验表明,我们的方法将准确率从28.9%提升至39.3%,相较于直接在5样本示例上微调的基线模型,实现了超过10%的性能增益。这证明了LLM驱动的数据生成和知识蒸馏对于少样本MCQA的有效性。