Recently, Large Language Models (LLMs) have garnered increasing attention in the field of natural language processing, revolutionizing numerous downstream tasks with powerful reasoning and generation abilities. For example, In-Context Learning (ICL) introduces a fine-tuning-free paradigm, allowing out-of-the-box LLMs to execute downstream tasks by analogy learning without any fine-tuning. Besides, in a fine-tuning-dependent paradigm where substantial training data exists, Parameter-Efficient Fine-Tuning (PEFT), as the cost-effective methods, enable LLMs to achieve excellent performance comparable to full fine-tuning. However, these fascinating techniques employed by LLMs have not been fully exploited in the ABSA field. Previous works probe LLMs in ABSA by merely using randomly selected input-output pairs as demonstrations in ICL, resulting in an incomplete and superficial evaluation. In this paper, we shed light on a comprehensive evaluation of LLMs in the ABSA field, involving 13 datasets, 8 ABSA subtasks, and 6 LLMs. Specifically, we design a unified task formulation to unify ``multiple LLMs for multiple ABSA subtasks in multiple paradigms.'' For the fine-tuning-dependent paradigm, we efficiently fine-tune LLMs using instruction-based multi-task learning. For the fine-tuning-free paradigm, we propose 3 demonstration selection strategies to stimulate the few-shot abilities of LLMs. Our extensive experiments demonstrate that LLMs achieve a new state-of-the-art performance compared to fine-tuned Small Language Models (SLMs) in the fine-tuning-dependent paradigm. More importantly, in the fine-tuning-free paradigm where SLMs are ineffective, LLMs with ICL still showcase impressive potential and even compete with fine-tuned SLMs on some ABSA subtasks.
翻译:近年来,大语言模型(LLMs)在自然语言处理领域受到越来越多的关注,其强大的推理和生成能力革新了众多下游任务。例如,上下文学习(ICL)引入了一种无需微调的范式,允许开箱即用的LLMs通过类比学习来执行下游任务,而无需任何微调。此外,在存在大量训练数据的依赖微调的范式中,参数高效微调(PEFT)作为高性价比的方法,使LLMs能够达到与全参数微调相媲美的优异性能。然而,LLMs所采用的这些引人注目的技术在ABSA领域尚未得到充分探索。先前的工作仅通过使用随机选择的输入-输出对作为ICL中的演示来探究LLMs在ABSA中的表现,导致评估不完整且流于表面。本文对LLMs在ABSA领域进行了全面评估,涉及13个数据集、8个ABSA子任务和6个LLMs。具体而言,我们设计了一个统一的任务框架,以整合"多种LLMs在多种范式中执行多种ABSA子任务"。对于依赖微调的范式,我们使用基于指令的多任务学习高效地微调LLMs。对于无需微调的范式,我们提出了3种演示选择策略以激发LLMs的少样本能力。我们的大量实验表明,在依赖微调的范式中,与经过微调的小语言模型(SLMs)相比,LLMs实现了新的最先进性能。更重要的是,在SLMs无效的无需微调范式中,具备ICL能力的LLMs仍然展现出令人印象深刻的潜力,甚至在部分ABSA子任务上与经过微调的SLMs相竞争。