Innovative Sentiment Analysis and Prediction of Stock Price Using FinBERT, GPT-4 and Logistic Regression: A Data-Driven Approach

This study explores the comparative performance of cutting-edge AI models, i.e., Finaance Bidirectional Encoder representations from Transsformers (FinBERT), Generatice Pre-trained Transformer GPT-4, and Logistic Regression, for sentiment analysis and stock index prediction using financial news and the NGX All-Share Index data label. By leveraging advanced natural language processing models like GPT-4 and FinBERT, alongside a traditional machine learning model, Logistic Regression, we aim to classify market sentiment, generate sentiment scores, and predict market price movements. This research highlights global AI advancements in stock markets, showcasing how state-of-the-art language models can contribute to understanding complex financial data. The models were assessed using metrics such as accuracy, precision, recall, F1 score, and ROC AUC. Results indicate that Logistic Regression outperformed the more computationally intensive FinBERT and predefined approach of versatile GPT-4, with an accuracy of 81.83% and a ROC AUC of 89.76%. The GPT-4 predefined approach exhibited a lower accuracy of 54.19% but demonstrated strong potential in handling complex data. FinBERT, while offering more sophisticated analysis, was resource-demanding and yielded a moderate performance. Hyperparameter optimization using Optuna and cross-validation techniques ensured the robustness of the models. This study highlights the strengths and limitations of the practical applications of AI approaches in stock market prediction and presents Logistic Regression as the most efficient model for this task, with FinBERT and GPT-4 representing emerging tools with potential for future exploration and innovation in AI-driven financial analytics

翻译：本研究探讨了前沿人工智能模型——金融领域双向编码器表示（FinBERT）、生成式预训练Transformer GPT-4以及逻辑回归——在利用金融新闻与NGX全股指数数据标签进行情感分析与股指预测中的比较性能。通过整合GPT-4、FinBERT等先进自然语言处理模型与传统机器学习模型逻辑回归，本研究旨在对市场情绪进行分类、生成情感评分并预测市场价格走势。此项研究凸显了人工智能在全球股票市场中的进展，展示了前沿语言模型如何促进对复杂金融数据的理解。模型评估采用了准确率、精确率、召回率、F1分数及ROC AUC等指标。结果表明，逻辑回归在计算效率上优于计算密集型的FinBERT与预设配置的通用GPT-4模型，其准确率达81.83%，ROC AUC为89.76%。GPT-4预设配置的准确率较低（54.19%），但在处理复杂数据方面展现出强大潜力。FinBERT虽能提供更精细的分析，但资源消耗较大且性能表现中等。通过Optuna超参数优化与交叉验证技术确保了模型的稳健性。本研究揭示了人工智能方法在股市预测实际应用中的优势与局限，并提出逻辑回归是本任务中最有效的模型，而FinBERT与GPT-4则代表了人工智能驱动金融分析领域具有未来探索与创新潜力的新兴工具。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日