From TF-IDF to Transformers: A Comparative and Ensemble Approach to Sentiment Classification

from arxiv, 6 pages, 9 figures. This is the author's accepted manuscript, presented at the International Conference on Intelligent Computing, Networks and Security (IC-ICNS 2026), March 26-28, Bhubaneswar, India. Proceedings publication pending

Sentiment analysis, also referred to as opinion mining, primarily tries to extract opinion from any text-based data. In the context of movie reviews and critics, sentimental analysis can be a helpful tool to predict whether a movie review is generally positive or negative. It can be difficult for the ML models to understand the context or metaphysical sentiment accurately, as ML models rely largely on statistical word representations. The objective of this paper is to examine and categorise movie reviews into positive and negative sentiments. Diverse machine learning models are considered in doing so, and Natural Language Processing (NLP) methodologies are employed for data preprocessing and model assessment. The IMDb dataset is used. Specifically, Naive Bayes, Logistic Regression, Support Vector Machines (SVM), LightGBM, LSTM, and transformer-based models such as RoBERTa and DistilBERT were evaluated. After a lot of testing with accuracy, precision, recall, F1-score, and ROC-AUC, RoBERTa performed better than all the other models, with an accuracy of 93.02%. A soft voting ensemble that combined all the models also improved classification performance, showing that model ensembling works well for sentiment analysis.

翻译：情感分析，又称意见挖掘，主要致力于从文本数据中提取观点。在电影评论与批评的语境下，情感分析可作为一种有效工具，预测电影评论整体呈正面还是负面情绪。由于机器学习模型主要依赖于统计性词语表征，因此要准确理解语境或形而上的情感较为困难。本文旨在对电影评论进行正面与负面情感的识别与分类。为此，本文考察了多种机器学习模型，并采用自然语言处理方法进行数据预处理与模型评估。实验基于IMDb数据集，具体评估了朴素贝叶斯、逻辑回归、支持向量机、LightGBM、长短期记忆网络以及基于Transformer的模型（如RoBERTa和DistilBERT）。经过大量针对准确率、精确率、召回率、F1分数及ROC-AUC指标的测试，RoBERTa以93.02%的准确率优于所有其他模型。此外，融合所有模型的软投票集成方法进一步提升了分类性能，表明模型集成在情感分析任务中具有良好效果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【MM 2021】基于Transformer的动态人脸表情识别网络,Former-DFER: Dynamic Facial Expression Recognition Transformer

专知会员服务

21+阅读 · 2022年3月22日

【复旦大学等】情感计算的系统综述:情感模型、数据库及研究进展，A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances

专知会员服务

55+阅读 · 2022年3月17日