Sentiment Analysis on Movie Reviews: A Deep Dive into Modern Techniques and Open Challenges

This paper presents a comprehensive survey of sentiment analysis methods for movie reviews, a benchmark task that has played a central role in advancing natural language processing. We review the evolution of techniques from early lexicon-based and classical machine learning approaches to modern deep learning architectures and large language models, covering widely used datasets such as IMDb, Rotten Tomatoes, and SST-2, and models ranging from Naive Bayes and support vector machines to LSTM networks, BERT, and attention-based transformers. Beyond summarizing prior work, this survey differentiates itself by offering a comparative, challenge-driven analysis of how these modeling paradigms address domain-specific issues such as sarcasm, negation, contextual ambiguity, and domain shift, which remain open problems in existing literature. Unlike earlier reviews that focus primarily on text-only pipelines, we also synthesize recent advances in multimodal sentiment analysis that integrate textual, audio, and visual cues from movie trailers and clips. In addition, we examine emerging concerns related to interpretability, fairness, and robustness that are often underexplored in prior surveys, and we outline future research directions including zero-shot and few-shot learning, hybrid symbolic--neural models, and real-time deployment considerations. Overall, this abstract provides a domain-focused roadmap that highlights both established solutions and unresolved challenges toward building more accurate, generalizable, and explainable sentiment analysis systems for movie review data.

翻译：本文对电影评论情感分析方法进行了全面综述，该基准任务在推动自然语言处理发展中发挥了核心作用。我们回顾了从早期基于词典和经典机器学习方法到现代深度学习架构及大语言模型的技术演进历程，涵盖IMDb、烂番茄和SST-2等广泛使用的数据集，以及从朴素贝叶斯、支持向量机到LSTM网络、BERT和基于注意力的Transformer等各类模型。除总结前人工作外，本综述通过提供对比性、挑战驱动的分析而独具特色，重点探讨这些建模范式如何应对领域特定问题（如讽刺、否定、语境歧义和领域偏移等现有文献中尚未解决的难题）。与早期主要关注纯文本流程的综述不同，我们还综合了多模态情感分析的最新进展，这些方法融合了电影预告片和片段中的文本、音频和视觉线索。此外，我们审视了可解释性、公平性和鲁棒性等现有综述中较少探讨的新兴问题，并展望了未来研究方向，包括零样本/少样本学习、混合符号-神经模型以及实时部署考量。总体而言，本摘要提供了一个领域聚焦的研究路线图，既阐明了现有解决方案，也揭示了尚未解决的挑战，旨在为电影评论数据构建更准确、可泛化且可解释的情感分析系统。