In the era of rapid technological advancement, social media platforms such as Twitter (X) have emerged as indispensable tools for gathering consumer insights, capturing diverse opinions, and understanding public attitudes. This research applies advanced machine learning methods for sentiment analysis on Twitter data, with a focus on predicting consumer trends. Using the Sentiment140 dataset, the study detects evolving patterns in consumer preferences with "car" as an example. A structured workflow was used to clean and prepare data for analysis. Machine learning models, including Support Vector Machines (SVM), Naive Bayes, Long Short-Term Memory (LSTM) networks, and Bidirectional Encoder Representations from Transformers (BERT), were employed to classify sentiments and predict trends. Model performance was measured using accuracy, precision, recall, and F1 score, with BERT achieving the highest results (Accuracy: 83.48%, Precision: 79.37%, Recall: 90.60%, F1: 84.61). Results show that LSTM and BERT effectively capture linguistic and contextual patterns, improving prediction accuracy and providing insights into consumer behavior. Temporal analysis revealed sentiment shifts across time, while Named Entity Recognition (NER) identified related terms and themes. This research addresses challenges like sarcasm detection and multilingual data processing, offering a scalable framework for generating actionable consumer insights.
翻译:在技术快速发展的时代,Twitter(X)等社交媒体平台已成为收集消费者洞察、捕捉多元观点和理解公众态度不可或缺的工具。本研究应用先进的机器学习方法对Twitter数据进行情感分析,重点在于预测消费者趋势。以“汽车”为例,研究利用Sentiment140数据集检测消费者偏好的演变模式。采用结构化工作流程对数据进行清洗和预处理以备分析。研究运用了包括支持向量机(SVM)、朴素贝叶斯、长短期记忆(LSTM)网络以及基于Transformer的双向编码器表征(BERT)在内的机器学习模型进行情感分类与趋势预测。模型性能通过准确率、精确率、召回率和F1分数进行评估,其中BERT取得了最佳结果(准确率:83.48%,精确率:79.37%,召回率:90.60%,F1分数:84.61)。结果表明,LSTM和BERT能有效捕捉语言与上下文模式,从而提升预测准确性并提供消费者行为洞察。时序分析揭示了情感随时间的动态变化,而命名实体识别(NER)则识别出相关术语与主题。本研究针对讽刺检测和多语言数据处理等挑战提出了解决方案,为生成可操作的消费者洞察提供了一个可扩展的框架。