This study addressed the complex task of sentiment analysis on a dataset of 119,988 original tweets from Weibo using a Convolutional Neural Network (CNN), offering a new approach to Natural Language Processing (NLP). The data, sourced from Baidu's PaddlePaddle AI platform, were meticulously preprocessed, tokenized, and categorized based on sentiment labels. A CNN-based model was utilized, leveraging word embeddings for feature extraction, and trained to perform sentiment classification. The model achieved a macro-average F1-score of approximately 0.73 on the test set, showing balanced performance across positive, neutral, and negative sentiments. The findings underscore the effectiveness of CNNs for sentiment analysis tasks, with implications for practical applications in social media analysis, market research, and policy studies. The complete experimental content and code have been made publicly available on the Kaggle data platform for further research and development. Future work may involve exploring different architectures, such as Recurrent Neural Networks (RNN) or transformers, or using more complex pre-trained models like BERT, to further improve the model's ability to understand linguistic nuances and context.
翻译:本研究采用卷积神经网络(CNN)对来自微博的119,988条原始推文数据集进行情感分析这一复杂任务,提出了一种自然语言处理(NLP)的新方法。数据来源于百度飞桨(PaddlePaddle)AI平台,经过精细预处理、分词处理,并依据情感标签进行分类。模型基于CNN架构,利用词嵌入进行特征提取,并训练以执行情感分类。该模型在测试集上达到了约0.73的宏平均F1分数,在正面、中性和负面情感上展现出均衡的性能。研究结果强调了CNN在情感分析任务中的有效性,对社交媒体分析、市场研究和政策研究等实际应用具有启示意义。完整的实验内容与代码已在Kaggle数据平台上公开发布,以促进进一步的研究与发展。未来工作可涉及探索不同架构(如循环神经网络(RNN)或Transformer),或使用更复杂的预训练模型(如BERT),以进一步提升模型对语言细微差别和上下文的理解能力。