This paper explores the application of Machine Learning (ML) and Natural Language Processing (NLP) techniques in cryptocurrency price forecasting, specifically Bitcoin (BTC) and Ethereum (ETH). Focusing on news and social media data, primarily from Twitter and Reddit, we analyse the influence of public sentiment on cryptocurrency valuations using advanced deep learning NLP methods. Alongside conventional price regression, we treat cryptocurrency price forecasting as a classification problem. This includes both the prediction of price movements (up or down) and the identification of local extrema. We compare the performance of various ML models, both with and without NLP data integration. Our findings reveal that incorporating NLP data significantly enhances the forecasting performance of our models. We discover that pre-trained models, such as Twitter-RoBERTa and BART MNLI, are highly effective in capturing market sentiment, and that fine-tuning Large Language Models (LLMs) also yields substantial forecasting improvements. Notably, the BART MNLI zero-shot classification model shows considerable proficiency in extracting bullish and bearish signals from textual data. All of our models consistently generate profit across different validation scenarios, with no observed decline in profits or reduction in the impact of NLP data over time. The study highlights the potential of text analysis in improving financial forecasts and demonstrates the effectiveness of various NLP techniques in capturing nuanced market sentiment.
翻译:本文探讨了机器学习(ML)与自然语言处理(NLP)技术在加密货币(特别是比特币BTC和以太坊ETH)价格预测中的应用。聚焦于新闻和社交媒体数据(主要来自Twitter和Reddit),我们采用先进的深度学习NLP方法分析公众情绪对加密货币估值的影响。除传统价格回归外,我们将加密货币价格预测视为分类问题,涵盖价格走势(上涨或下跌)预测与局部极值识别。通过对比不同ML模型(含与不含NLP数据集成)的性能,研究发现引入NLP数据可显著提升预测表现。我们观察到,Twitter-RoBERTa和BART MNLI等预训练模型能高效捕捉市场情绪,而微调大型语言模型(LLM)同样带来实质性改进。值得注意的是,BART MNLI零样本分类模型在从文本数据中提取看涨/看跌信号方面表现卓越。所有模型均在多种验证场景下持续产生收益,且未观察到收益衰减或NLP数据影响随时间减弱的现象。本研究凸显了文本分析在优化金融预测中的潜力,并论证了多种NLP技术捕捉微妙市场情绪的有效性。