Despite the advances in the abstractive summarization task using Large Language Models (LLM), there is a lack of research that asses their abilities to easily adapt to different domains. We evaluate the domain adaptation abilities of a wide range of LLMs on the summarization task across various domains in both fine-tuning and in-context learning settings. We also present AdaptEval, the first domain adaptation evaluation suite. AdaptEval includes a domain benchmark and a set of metrics to facilitate the analysis of domain adaptation. Our results demonstrate that LLMs exhibit comparable performance in the in-context learning setting, regardless of their parameter scale.
翻译:尽管基于大语言模型的抽象摘要任务取得了进展,但目前仍缺乏评估其轻松适应不同领域能力的研究。我们在微调和上下文学习两种设置下,评估了多种大语言模型在不同领域摘要任务中的领域适应能力。同时,我们提出了首个领域适应评估套件AdaptEval。该套件包含领域基准测试和一套用于促进领域适应分析的评估指标。我们的实验结果表明,在上下文学习设置下,大语言模型展现出与其参数量级无关的相当性能。