Large language models for aspect-based sentiment analysis

Large language models (LLMs) offer unprecedented text completion capabilities. As general models, they can fulfill a wide range of roles, including those of more specialized models. We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings on the aspect-based sentiment analysis (ABSA) task. Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task of the SemEval-2014 Task 4, improving upon InstructABSA [@scaria_instructabsa_2023] by 5.7%. However, this comes at the price of 1000 times more model parameters and thus increased inference cost. We discuss the the cost-performance trade-offs of different models, and analyze the typical errors that they make. Our results also indicate that detailed prompts improve performance in zero-shot and few-shot settings but are not necessary for fine-tuned models. This evidence is relevant for practioners that are faced with the choice of prompt engineering versus fine-tuning when using LLMs for ABSA.

翻译：大语言模型（LLMs）展现出前所未有的文本补全能力。作为通用模型，它们可胜任包括专业模型在内的多种角色。我们评估了GPT-4和GPT-3.5在零样本、少样本及微调设置下对方面情感分析任务的表现。在SemEval-2014任务4的联合方面术语提取与极性分类任务中，微调后的GPT-3.5取得83.8的F1分数，较InstructABSA提升5.7%，达到当前最优水平。然而，这一提升以模型参数量增加1000倍及相应推理成本上升为代价。我们探讨了不同模型的成本-性能权衡，并分析了其典型错误类型。研究结果还表明：详细提示在零样本与少样本环境下能提升性能，但对微调模型并非必要。该发现对使用LLM进行ABSA时面临提示工程与微调选择的实践者具有参考价值。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大语言模型简明指南

专知会员服务

143+阅读 · 2023年7月29日

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日